LIMIT clause and Indexscan

Hello,

We have a query with LIMIT 100. There is no ORDER BY.

The query with WHERE clause with value-A returns 2 docs and takes ~25 seconds whereas with value-B it returns 100 docs but takes only 300 millisec. For value-B, there are ~99k docs in the database whereas for value-A there are only 2 docs. Execution Plan is identical for both.

On checking the plan stats, we see, that for value-A, ItemsOut for IndexScan & Fetch phase is 100k. I can understand - the planner would not be aware that there are only 2 docs until it performs a Fetch and hence until the Filter phase, ItemsIn/Out is the full data set i.e. 100k docs.
For value-B, ItemsOut for IndexScan is 2.5k!!! When value-A/value-B is not a part of the index, then how did it know that for value-B, even if it stops at 2.5k docs during the IndexScan phase, it will suffice? It baffles me! Can someone help understand pls?

Thanks

I’m guessing that the person answering this would want to see the query plan and plans stats.

As there is no ORDER BY, you want any 100 documents.
If all the predicates are pushed to indexer and no false positives possible the limit will be pushed to index scan and indexer only produces limit documents. This best case scenario.
Other wise query works as normal produces results. When final results reach limit it satisifes query semantics and stops the query. I,e in value-B

As in case of value-A if predicates not part of index and you can end up doing whole scan of collection because excepting more results present and query semantics not satisfied yet.

See https://www.couchbase.com/blog/wp-content/uploads/2017/10/N1QL-A-Practical-Guide-2nd-Edition.pdf Page 89

Best way is define index such that it has all predicates part of index keys.
Improvements in 7.1.3
https://issues.couchbase.com/browse/MB-51817
https://issues.couchbase.com/browse/MB-40499

Thanks @vsr1

The query including the predicate is identical. Only difference is that the value for one of the filters is different. e.g. field_x=A for the slower execution & field_x=B for the faster execution.

“field_x” is not a part of index but that is the case for both executions. So we expected that for both executions, the ItemsIn/Out would be same till the Fetch phase. But that is not the case.

That is expected.
Each operator works on different thread. Once limit reached it stops all earlier operators.

Thanks again @vsr1

its an aha! moment

By “thread”, believe you meant a couchbase thread? To make sure that I am getting this right:
Fetch will not wait for indexScan to complete. Instead, while IndexScan is still in progress, Fetch will get started for some of the keys returned by IndexScan. While Fetch is still in progress, Filter will start for some of the docs from the Fetch. At this point, if the count reaches 100, then it will ask Fetch & IndexScan to stop. Am I describing it correctly?

So, in case of field-x=A, with LIMIT 100, since there are only 2 docs, it will never reach the count of 100 and hence it has to wait for entire IndexScan to complete then entire Fetch to complete and so on. Whereas in case of field-x=B, with LIMIT 100, since there are approx. 99k docs, as soon as it hits a count of 100, all operations are stopped and does not have to wait for IndexScan/Fetch to complete.

Thanks

You got it. This is how most RDBMS also does they call different names operator/iterator/thread etc. In couchbase each operator is run on different go ( it uses golang) thread

FYI: This way better utilize CPUs, less memory usage, results can be streamed and low latencies. Some of operators are stop and go. i.e. Until it completes it can’t output row. Example ORDER BY, GROUP BY. last row can be first one.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.