Using Multiple Covering Indexes At Same Time Forces Primary Index Scan?

connorwb93 · September 11, 2020, 8:55pm

I have a bucket with 2m documents, out of which all but 180k are audits (have “<AUDIT>” partway through their ID). I have a primary index, and also have 2 other indexes, both with the same filter requirements that exclude audit documents

 CREATE INDEX device_model ON bucket(lower((device_info.model))) WHERE (((meta().id) like "device---%") and (not contains((meta().id), "<AUDIT>"))) WITH { "num_replica":1 } 
 CREATE INDEX device_make ON bucket(lower((device_info.make))) WHERE (((meta().id) like "device---%") and (not contains((meta().id), "<AUDIT>"))) WITH { "num_replica":1 }

If I run a query that uses either one of these two as a covering index, eg

select meta().id from bucket
where meta().id like "device---%"
and not contains(meta().id, "<AUDIT>")
and lower(device_info.make) = "test"
offset 100000
limit 100

,it works great, and responds in ~35ms, only looking at that covering index.

If I do a query that uses both, eg

select meta().id from bucket
where meta().id like "device---%"
and not contains(meta().id, "<AUDIT>")
and lower(device_info.make) = "test"
and lower(device_info.model) = "test"
offset 100000
limit 100

it takes like 16 seconds. In the explain plan it is doing an intersect scan with not only both secondary indexes but also the primary index, which requires iterating over all 2m documents in that index.

So… why is this? Why can’t it just intersect the 2 highly selective covering indexes? Any help appreciated.

(all doc IDs are prefixed with a doc type of device---, so a doc ID might be device---test1 or device---test1<AUDIT>143243). Also this is server v5.5.1.

vsr1 · September 11, 2020, 9:04pm

Use composite index like below

 CREATE INDEX device_make_model ON bucket(lower((device_info.make)),lower(device_info.model)  ) 
WHERE (((meta().id) like "device---%") and (not contains((meta().id), "<AUDIT>"))) WITH { "num_replica":1 }

connorwb93 · September 11, 2020, 9:06pm

Thanks for the response. The problem with this solution is we want to enable searching on any one of a number of attributes together, there are actually several more attribute indexes like this. But we don’t want to have composite indexes for every combination. And I am still confused why in our case it’s using the primary index at all.

vsr1 · September 11, 2020, 9:11pm

GSI index is based on btree. If you want single key indexes will not perform well because indexscan on single key and fetch document apply filters on other fields.

Other option for your USE case is FTS index.

connorwb93 · September 11, 2020, 9:50pm

Well this is not the most optimal set of indexes for this query. But it should perform a lot better than it currently is, by simply dropping the primary index scan, unless I’m misunderstanding something.

vsr1 · September 11, 2020, 11:05pm

Couchbase rule based optimizer it doesn’t know selectivity. IntersectScan uses early termination vs pure intersect (explained here ). If IntersectScan not performing (it based on selectivity which doesn’t know), avoid Intersect Scan by providing USE INDEX hint.

When ever it uses IntersectScan, UnionScan query can’t be covered because indexes maintained asynchronously and each will have different snapshots.

Instead of offset 100K explore keyset pagination https://blog.couchbase.com/offset-keyset-pagination-n1ql-query-couchbase/