Hello @vsri
Here is the new thread:
We are trying to find the most optimum way to JOIN several documents:
Given:
Document 1:
Bucket users
inst::XYZ
{
“uid”: 11,
“name”: "ABC,
“tp”: “inst”,
}
Bucket places
Document 2:
u::1
{
“id”:1,
“lstname”:“vaca”
“tp”:“u”
}
Indexes:
CREATE INDEX idx_baker1 ON users(uid,name) WHERE (tp = ‘inst’) USING GSI;
CREATE INDEX idx_baker2 ON places (lstname,id) WHERE (tp = ‘u’) USING GSI;
Original Queries:
Query 1:
select inst.name, inst.uid, u.id
from users inst join places u on keys [‘u:’||TO_STRING(inst.uid)]
where inst.tp = ‘inst’ and u.tp = ‘u’ and inst.uid = ‘abc’ and u.lstname = ‘vaca’
Query 2:
select inst.name, inst.uid, u.id
from users inst join places u on keys (select raw meta(u2).id from users u2 where u2.tp = ‘u’ and u.lstname = ‘vaca’ )
where u.id = inst.uid and inst.tp = ‘inst’ and u.tp = ‘u’ and inst.uid = ‘abc’
Couchbase suggested Queries:
Alternative A:
“The following avoids JOINs and uses covered indexes. Check how it performs.”
SELECT inst.name, inst.uid, pu.id FROM users inst
LET pu = (SELECT u.id, META(u).id AS mid FROM places u WHERE u.tp = ‘u’ AND u.lstname = ‘vaca’)[0]
WHERE inst.tp = ‘inst’ AND inst.uid = ‘abc’ AND pu.mid = u:’||TO_STRING(inst.uid);
SELECT inst.name, inst.uid, u2.id
LET pu = (SELECT u.id, META(u).id AS mid FROM places u WHERE u.tp = ‘u’ AND u.lstname = ‘vaca’)
FROM users inst UNNEST pu AS u2
WHERE inst.tp = ‘inst’ AND inst.uid <> ‘abc’ AND u2.id = inst.uid;
Alternative B:
CREATE INDEX idx_baker3 ON users(“u::”||uid,uid,name) WHERE (tp = ‘inst’) USING GSI;
SELECT inst.name, inst.uid, u.id
FROM places u
JOIN users inst ON KEY “u::”||inst.uid FOR places
WHERE u.tp = ‘u’ AND u.lstname = ‘vaca’ AND inst.tp = ‘inst’ AND inst.uid = ‘abc’;
SELECT inst.name, inst.uid, u.id
FROM places u
JOIN users inst ON KEY “u::”||inst.uid FOR places
WHERE u.tp = ‘u’ AND u.lstname = ‘vaca’ AND inst.tp = ‘inst’ AND inst.uid = ‘abc’ AND u.id = inst.uid;
Originally queries and Alternative queries seem to have a very similar plans.
We moved both sets of documents to the same buckets, but were not able to get a covered index.
We tried creating the following index on all predicates of Alternative A
CREATE INDEX idx_type_baker3 ON users (tp,uid,name,id,lstname) WHERE (tp= ‘inst’ or tp = ‘u’) USING GSI;
A) does this index make sense? B) what would happen if the object e.g “name” existed in both documents? C) is it a good practice to have WHERE (tp= ‘inst’ or tp = ‘u’ or … ) in the indexx
Alternative B
The second query does not match the scope of the second original query. Additionally, we were not able to join more than two documents using the ON …KEY …FOR, which is a bit of a limitation.
D) Can you multiple JOINs with this syntax?
E) Please help us understand which one of the 3 sets of queries is most efficient (speed and cost)?
F) In this context, does it make sense to merge all buckets into one bucket to make everybody’s life easier
Many thanks for your assistance in advanced,
BG