the limit & offset clause is always applied as the last operation in sequence by the optimizer. Giving for example a set of documents, with the structure as below:
- a primary index on id field
- and a secondary index on tag field
issuing the query:
select a.* from default a where tag =‘somevalue’ offset x, limit y
when x grows to a large value retrieval time increase proportionally to x value.
Cause: the optimizer choose to use the secondary index, then does a very fast index scan but try to retrieve all documents by key from the buckets, this until x record position is reached, which in this situation is not needed because the server can only scan the first x entries only from the index then starting fetching documents for y documents instead of fetching x+y documents. this overload the server heavily.
Workaround found: rewrite the query by joining the main query with a subquery that does only metadata fetch operations, something like this:
select a.* from default a
join (select meta(b).id from default b where tag = ‘somevalue’ offset x limit y) on keys (…)
This way the optimizer choose only to fetch keys values from index, (it’s covered) then generate a sequence of y keys starting from x position, then join the outer query with resulting key(s) sequence, and does bucket retrieval.
So, the number of documents fetched is always y and not x+y
Is there a way to make the optimizer recognize this situation by doing (in this case) only y document fetches starting from index key data at x value?
this will allow to avoid the trick above.
Which version are you using? There are optimizations in 4.1 and 4.5, respectively.
4.5. It’s easy to test this. just upload 100000 documents with the structure above and play with indexes (gsi) based on tag.
a simple explain plan reveal the problem.
select b.* from b offset 90000 limit 1000
take a lot longer than
select b.* from b offset 0 limit 1000
this is not because of reduced index scan in the second example but reduced document gets from data service
(it’s true that there is a little penalty on index scan but this is expected. What is not expected is the high number of get operations in the first scenario, because i expect 1000 gets not 90000+1000.)
Go it. Yes, we will be pushing OFFSET down to indexer. Here is the JIRA ticket: https://issues.couchbase.com/browse/MB-18404
Please note that it’s a particular case.
if the query does include in the where predicate some kind of expression or field NOT included in the gsi index the current behaviour is correct
Hi @Old8088, Good finding and investigation into pagination queries.
Here’s a way for you to improve the query (actually document fetch) performance a bit. Yes, it does require you to write a nested subquery. But, if the milliseconds are precious, this will save additional time by fetching the documents faster.
FROM (select array_agg(t0.id) as keylist from
(select meta().id as id from
travel-sample where type = ‘route’ offset 20000 limit 100) as t0
) as t1
travel-sample t2 on keys t1.keylist;
thanks kenshav_m but in my issue report i’ve already done this, see below the keyword “Workaround” in my previous post, i’ve done exactly this.
Hi @Old8088 Great. Observe the usage of array_agg() in the query, resulting in a faster KV fetch.
@kenshaw_m thank you for hint