Hi.
I couchbase 4.5.
The recommended number of buckets?
Please answer thank you.
Hi.
I couchbase 4.5.
The recommended number of buckets?
Please answer thank you.
You can create a maximum of 10, but we recommend “the less the better”, but it also depends on your application. Think of a bucket as a “database” in an RDBMS and not as a “table”, it also manages your resources (ram, size,…).
Hello daschl,
We have multiple buckets in CB. The data was separated based on logical reasons. Now the issue we are running into is that we lost the ability to create cover indexes because an index cannot be created on multiple buckets and most of our queries cross at least 2 buckets! Is there a way to create a cover index on data sitting in multiple buckets? Consolidating all the data in one bucket appears to be the only solution if query speed is of the essence?
Thanks,
BG
If the queries cross at least 2 buckets means you are using JOIN’s. One of the bucket can be covered and others will do non-covered.
If combine multiple buckets you still want to data keep separated by logical reason (by separating type field in the document) the indexes might be created on that logical boundary (i.e. same as before) for performance reason. Unless you change data model and embedded the documents.
Please post query and index you have will see if we can help. What is CB version. May be separate post.
vsr1:
Thank you for the prompt response;
We primarily have two types of joins:
1- explain select inst.name, inst.uid, u.id
from users inst join places u on keys [‘u:’||TO_STRING(inst.uid)]
where inst.tp = ‘inst’ and u.tp = ‘u’ and inst.uid = ‘abc’ and u.lstname = ‘vaca’
2- select inst.name, inst.uid, u.id
from users inst join places u on keys (select raw meta(u2).id from users u2 where u2.tp = ‘u’ and u.lstname = ‘vaca’ )
where u.id = inst.uid and inst.tp = ‘inst’ and u.tp = ‘u’ and inst.uid <> ‘abc’
We have the following indexes:
1- CREATE INDEX idx_baker1 ON users(uid
,name
) WHERE (tp
= ‘inst’) USING GSI;
2- CREATE INDEX idx_baker2 ON places (lstname
,id
) WHERE (tp
= ‘u’) USING GSI;
Additionally for the second query above we have a cover index for its subquery.
The explain below uses idx_baker 1 as a cover index, which is perfect. What happens next is not very clear! Idx_baker2 does not appear to be used neither does the cover index of the subquery. Also there is no “fetch” of any document yet somehow we are able to get the u.id from the select?! Could you please help us understand how the join executed. Do not hesitate to go into the details.
Would we able to get a cover index across the join if we put the documents in the same bucket (say users) and create an index that would cover all of predicates? something like:
CREATE INDEX idx_type_baker2 ON users(tp
,uid
,name
,id
,lstname) WHERE (tp
= ‘inst’ or tp
= ‘u’) USING GSI;
[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “DistinctScan”,
“scan”: {
"#operator": “IndexScan”,
“covers”: [
“cover ((inst
.uid
))”,
“cover ((inst
.name
))”,
“cover ((meta(inst
).id
))”
],
“filter_covers”: {
“cover ((inst
.tp
))”: “inst”
},
“index”: “idx_baker1”,
“index_id”: “88dc4f6bd9116f12”,
“keyspace”: “users”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“abc”"
],
“Inclusion”: 0,
“Low”: [
“null”
]
}
},
{
“Range”: {
“Inclusion”: 0,
“Low”: [
"“abc”"
]
}
}
],
“using”: “gsi”
}
},
{
"#operator": “Join”,
“as”: “u”,
“keyspace”: “places”,
“namespace”: “default”,
“on_keys”: “(select raw (meta(u2
).id
) from users
as u2
where (((u2
.tp
) = “u”) and ((u2
.lstname
) = “vaca”)))”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “((((u
.id
) = cover ((inst
.uid
))) and (cover ((inst
.tp
)) = “inst”)) and (not (cover ((inst
.uid
)) = “abc”)))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “cover ((inst
.name
))”
},
{
“expr”: “cover ((inst
.uid
))”
},
{
“expr”: “(u
.id
)”
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
},
“text”: “select inst.name, inst.uid, u.id \nfrom users inst join places u on keys (select raw meta(u2).id from users u2 where u2.tp = ‘u’ and u2.lstname = ‘vaca’)\nwhere u.id = inst.uid and inst.tp = ‘inst’ and inst.uid <> ‘abc’”
}
]
Many thanks in advanced,
B
Hi @Baker,
idx_baker2 is used because EXPLAIN follows subqueries in FROM clause not all others so it is not displayed.
Unless you embedded one document in another you still need join which will makes difficult to cover. Based on information you provided so far have following suggestions. Try and see which one works best for you.
Your original Indexes.
CREATE INDEX idx_baker1 ON users(uid, name) WHERE (tp = 'inst') USING GSI;
CREATE INDEX idx_baker2 ON places (lstname,id) WHERE (tp = 'u') USING GSI;
Both following original Lookup Join queries are optimized for your original indexes.
SELECT inst.name, inst.uid, u.id
FROM users inst
JOIN places u ON KEYS 'u:'||TO_STRING(inst.uid)
WHERE inst.tp = 'inst' AND inst.uid = 'abc' AND u.tp = 'u' AND u.lstname = 'vaca';
SELECT inst.name, inst.uid, u.id
FROM users inst
JOIN places u ON KEYS (SELECT RAW META(u2).id FROM users u2 WHERE u2.tp = 'u' AND u.lstname = 'vaca')
WHERE inst.tp = 'inst' AND inst.uid <> 'abc' AND u.id = inst.uid AND u.tp = 'u';
The following avoids JOINs and uses covered indexes. Check how it performs.
SELECT inst.name, inst.uid, pu.id FROM users inst
LET pu = (SELECT u.id, META(u).id AS mid FROM places u WHERE u.tp = 'u' AND u.lstname = 'vaca')[0]
WHERE inst.tp = 'inst' AND inst.uid = 'abc' AND pu.mid = u:'||TO_STRING(inst.uid);
SELECT inst.name, inst.uid, u2.id
LET pu = (SELECT u.id, META(u).id AS mid FROM places u WHERE u.tp = 'u' AND u.lstname = 'vaca')
FROM users inst UNNEST pu AS u2
WHERE inst.tp = 'inst' AND inst.uid <> 'abc' AND u2.id = inst.uid;
Following switch the order of JOINs and uses Index Join because you have better predicate on LEFT side.
In 4.6.0+ It should use covered index on both sides (In case of Index Joins).
CREATE INDEX idx_baker3 ON users("u::"||uid,uid,name) WHERE (tp = 'inst') USING GSI;
SELECT inst.name, inst.uid, u.id
FROM places u
JOIN users inst ON KEY "u::"||inst.uid FOR places
WHERE u.tp = 'u' AND u.lstname = 'vaca' AND inst.tp = 'inst' AND inst.uid = 'abc';
SELECT inst.name, inst.uid, u.id
FROM places u
JOIN users inst ON KEY "u::"||inst.uid FOR places
WHERE u.tp = 'u' AND u.lstname = 'vaca' AND inst.tp = 'inst' AND inst.uid <> 'abc' AND u.id = inst.uid;
Thank you @vsr1 for the detailed response. I will test your queries and publish them in a new thread.
Going back to the main topic. We currently have 4 buckets. The buckets characteristics as follows:
Bucket 1: size would be large with few mutation rate and has around 10 "types"
Bucket 2: size is small with moderate mutation rate and has around 5 "types"
Bucket 3: size is huge with high mutation rate and has around 10 "types"
Bucket 4: size is huge with very high mutation rate and has around 3 “types”
Each of the buckets represents major functionalities of the app. none of these buckets can run independently of the others, at least with the current data structure.
Almost all queries cross minimum 2 buckets and 5 of our major GET queries cross all buckets.
The question is, does it make sense keeping them separate? or it is worth while putting everything in one bucket? query speed is very important to us.
Thanks for your assistance,
BG
Unless you embedded one document in another you still need join it doesn’t matter if it is on same bucket or different bucket for queries. Please post couchbase version so that any recommendations can be made based on functionality.