Recommended number of buckets?

yjjy2021 · February 10, 2017, 8:10am

Hi.

I couchbase 4.5.

The recommended number of buckets?

Please answer thank you.

daschl · February 10, 2017, 8:17am

You can create a maximum of 10, but we recommend “the less the better”, but it also depends on your application. Think of a bucket as a “database” in an RDBMS and not as a “table”, it also manages your resources (ram, size,…).

Baker · July 17, 2017, 1:25pm

Hello daschl,
We have multiple buckets in CB. The data was separated based on logical reasons. Now the issue we are running into is that we lost the ability to create cover indexes because an index cannot be created on multiple buckets and most of our queries cross at least 2 buckets! Is there a way to create a cover index on data sitting in multiple buckets? Consolidating all the data in one bucket appears to be the only solution if query speed is of the essence?
Thanks,
BG

vsr1 · July 17, 2017, 1:37pm

If the queries cross at least 2 buckets means you are using JOIN’s. One of the bucket can be covered and others will do non-covered.

If combine multiple buckets you still want to data keep separated by logical reason (by separating type field in the document) the indexes might be created on that logical boundary (i.e. same as before) for performance reason. Unless you change data model and embedded the documents.

Please post query and index you have will see if we can help. What is CB version. May be separate post.

Baker · July 18, 2017, 12:12am

vsr1:

Thank you for the prompt response;

We primarily have two types of joins:

1- explain select inst.name, inst.uid, u.id
from users inst join places u on keys [‘u:’||TO_STRING(inst.uid)]
where inst.tp = ‘inst’ and u.tp = ‘u’ and inst.uid = ‘abc’ and u.lstname = ‘vaca’

2- select inst.name, inst.uid, u.id
from users inst join places u on keys (select raw meta(u2).id from users u2 where u2.tp = ‘u’ and u.lstname = ‘vaca’ )
where u.id = inst.uid and inst.tp = ‘inst’ and u.tp = ‘u’ and inst.uid <> ‘abc’

We have the following indexes:

1- CREATE INDEX idx_baker1 ON users(uid,name) WHERE (tp = ‘inst’) USING GSI;
2- CREATE INDEX idx_baker2 ON places (lstname,id) WHERE (tp = ‘u’) USING GSI;
Additionally for the second query above we have a cover index for its subquery.

The explain below uses idx_baker 1 as a cover index, which is perfect. What happens next is not very clear! Idx_baker2 does not appear to be used neither does the cover index of the subquery. Also there is no “fetch” of any document yet somehow we are able to get the u.id from the select?! Could you please help us understand how the join executed. Do not hesitate to go into the details.

Would we able to get a cover index across the join if we put the documents in the same bucket (say users) and create an index that would cover all of predicates? something like:

CREATE INDEX idx_type_baker2 ON users(tp,uid,name,id,lstname) WHERE (tp = ‘inst’ or tp = ‘u’) USING GSI;

[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “DistinctScan”,
“scan”: {
"#operator": “IndexScan”,
“covers”: [
“cover ((inst.uid))”,
“cover ((inst.name))”,
“cover ((meta(inst).id))”
],
“filter_covers”: {
“cover ((inst.tp))”: “inst”
},
“index”: “idx_baker1”,
“index_id”: “88dc4f6bd9116f12”,
“keyspace”: “users”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“abc”"
],
“Inclusion”: 0,
“Low”: [
“null”
]
}
},
{
“Range”: {
“Inclusion”: 0,
“Low”: [
"“abc”"
]
}
}
],
“using”: “gsi”
}
},
{
"#operator": “Join”,
“as”: “u”,
“keyspace”: “places”,
“namespace”: “default”,
“on_keys”: “(select raw (meta(u2).id) from users as u2 where (((u2.tp) = “u”) and ((u2.lstname) = “vaca”)))”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “((((u.id) = cover ((inst.uid))) and (cover ((inst.tp)) = “inst”)) and (not (cover ((inst.uid)) = “abc”)))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “cover ((inst.name))”
},
{
“expr”: “cover ((inst.uid))”
},
{
“expr”: “(u.id)”
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
},
“text”: “select inst.name, inst.uid, u.id \nfrom users inst join places u on keys (select raw meta(u2).id from users u2 where u2.tp = ‘u’ and u2.lstname = ‘vaca’)\nwhere u.id = inst.uid and inst.tp = ‘inst’ and inst.uid <> ‘abc’”
}
]

Many thanks in advanced,

B

vsr1 · July 18, 2017, 1:18am

Hi @Baker,

Could you please create separate thread( with this information) for this post because it doesn’t match with subject.
Is it possible post sample documents only fields that referenced by query
do you have any relation from places to users i.e places document contain the document key of users
Couchbase Version
can you confirm subquery of second query is it places or users?

idx_baker2 is used because EXPLAIN follows subqueries in FROM clause not all others so it is not displayed.

vsr1 · July 18, 2017, 5:59am

Unless you embedded one document in another you still need join which will makes difficult to cover. Based on information you provided so far have following suggestions. Try and see which one works best for you.

Your original Indexes.

CREATE INDEX idx_baker1 ON users(uid, name) WHERE (tp = 'inst') USING GSI;
CREATE INDEX idx_baker2 ON places (lstname,id) WHERE (tp = 'u') USING GSI;

Both following original Lookup Join queries are optimized for your original indexes.

SELECT inst.name, inst.uid, u.id
    FROM users inst
    JOIN places u ON KEYS 'u:'||TO_STRING(inst.uid)
    WHERE inst.tp = 'inst' AND inst.uid = 'abc' AND u.tp = 'u' AND u.lstname = 'vaca';

SELECT inst.name, inst.uid, u.id
    FROM users inst
    JOIN places u ON KEYS (SELECT RAW META(u2).id FROM users u2 WHERE u2.tp = 'u' AND u.lstname = 'vaca')
    WHERE inst.tp = 'inst' AND inst.uid <> 'abc' AND u.id = inst.uid AND u.tp = 'u';

The following avoids JOINs and uses covered indexes. Check how it performs.

 SELECT inst.name, inst.uid, pu.id FROM users inst
    LET pu = (SELECT u.id, META(u).id AS mid FROM places u WHERE u.tp = 'u' AND u.lstname = 'vaca')[0]
    WHERE inst.tp = 'inst' AND inst.uid = 'abc' AND pu.mid = u:'||TO_STRING(inst.uid);

SELECT inst.name, inst.uid, u2.id
    LET pu = (SELECT u.id, META(u).id AS mid FROM places u WHERE u.tp = 'u' AND u.lstname = 'vaca')
    FROM users inst UNNEST pu AS u2
    WHERE inst.tp = 'inst' AND inst.uid <> 'abc' AND u2.id = inst.uid;

Following switch the order of JOINs and uses Index Join because you have better predicate on LEFT side.
In 4.6.0+ It should use covered index on both sides (In case of Index Joins).

CREATE INDEX idx_baker3 ON users("u::"||uid,uid,name) WHERE (tp = 'inst') USING GSI;
SELECT inst.name, inst.uid, u.id
    FROM places u
    JOIN users inst ON KEY "u::"||inst.uid FOR places
    WHERE u.tp = 'u' AND u.lstname = 'vaca' AND inst.tp = 'inst' AND inst.uid = 'abc';

SELECT inst.name, inst.uid, u.id
    FROM places u
    JOIN users inst ON KEY "u::"||inst.uid FOR places
    WHERE u.tp = 'u' AND u.lstname = 'vaca' AND inst.tp = 'inst' AND inst.uid <> 'abc' AND u.id = inst.uid;

Baker · July 18, 2017, 4:59pm

Thank you @vsr1 for the detailed response. I will test your queries and publish them in a new thread.

Going back to the main topic. We currently have 4 buckets. The buckets characteristics as follows:

Bucket 1: size would be large with few mutation rate and has around 10 “types”
Bucket 2: size is small with moderate mutation rate and has around 5 “types”
Bucket 3: size is huge with high mutation rate and has around 10 “types”
Bucket 4: size is huge with very high mutation rate and has around 3 “types”

Each of the buckets represents major functionalities of the app. none of these buckets can run independently of the others, at least with the current data structure.

Almost all queries cross minimum 2 buckets and 5 of our major GET queries cross all buckets.

The question is, does it make sense keeping them separate? or it is worth while putting everything in one bucket? query speed is very important to us.

Thanks for your assistance,

BG

vsr1 · July 18, 2017, 5:06pm

Unless you embedded one document in another you still need join it doesn’t matter if it is on same bucket or different bucket for queries. Please post couchbase version so that any recommendations can be made based on functionality.

Topic		Replies	Views
Couchbase tutorial and multiple bucket Couchbase Server	1	823	November 20, 2018
Should I be using multiple buckets? SQL++	1	1859	April 30, 2014
Does CouchBase have a limit on number of buckets? Couchbase Server	1	3381	March 22, 2013
Couchbase 6 bucket limit and implications Couchbase Server	8	1998	August 13, 2019
10 bucket limit introduced in Couchbase Server 2.0 Couchbase Server	4	6842	March 11, 2013

Recommended number of buckets?

Related topics