N1QL limit & offset optimization issue

Old8088 · July 18, 2016, 2:49pm

the limit & offset clause is always applied as the last operation in sequence by the optimizer. Giving for example a set of documents, with the structure as below:
{
tag:‘ABC’,
id:0,
info:‘test’
}

a primary index on id field
and a secondary index on tag field

issuing the query:
select a.* from default a where tag =‘somevalue’ offset x, limit y

when x grows to a large value retrieval time increase proportionally to x value.
Cause: the optimizer choose to use the secondary index, then does a very fast index scan but try to retrieve all documents by key from the buckets, this until x record position is reached, which in this situation is not needed because the server can only scan the first x entries only from the index then starting fetching documents for y documents instead of fetching x+y documents. this overload the server heavily.

Workaround found: rewrite the query by joining the main query with a subquery that does only metadata fetch operations, something like this:

select a.* from default a
join (select meta(b).id from default b where tag = ‘somevalue’ offset x limit y) on keys (…)

This way the optimizer choose only to fetch keys values from index, (it’s covered) then generate a sequence of y keys starting from x position, then join the outer query with resulting key(s) sequence, and does bucket retrieval.
So, the number of documents fetched is always y and not x+y

Is there a way to make the optimizer recognize this situation by doing (in this case) only y document fetches starting from index key data at x value?
this will allow to avoid the trick above.

thx

geraldss · July 18, 2016, 4:05pm

Which version are you using? There are optimizations in 4.1 and 4.5, respectively.

Old8088 · July 18, 2016, 10:23pm

4.5. It’s easy to test this. just upload 100000 documents with the structure above and play with indexes (gsi) based on tag.
a simple explain plan reveal the problem.
basically
a
select b.* from b offset 90000 limit 1000
take a lot longer than
select b.* from b offset 0 limit 1000

this is not because of reduced index scan in the second example but reduced document gets from data service

(it’s true that there is a little penalty on index scan but this is expected. What is not expected is the high number of get operations in the first scenario, because i expect 1000 gets not 90000+1000.)

geraldss · July 18, 2016, 10:25pm

Go it. Yes, we will be pushing OFFSET down to indexer. Here is the JIRA ticket: https://issues.couchbase.com/browse/MB-18404

Old8088 · July 18, 2016, 11:00pm

Please note that it’s a particular case.
if the query does include in the where predicate some kind of expression or field NOT included in the gsi index the current behaviour is correct

geraldss · July 18, 2016, 11:04pm

Noted …

keshav_m · July 18, 2016, 11:23pm

Hi @Old8088, Good finding and investigation into pagination queries.

Here’s a way for you to improve the query (actually document fetch) performance a bit. Yes, it does require you to write a nested subquery. But, if the milliseconds are precious, this will save additional time by fetching the documents faster.

SELECT t2
FROM (select array_agg(t0.id) as keylist from
(select meta().id as id from travel-sample where type = ‘route’ offset 20000 limit 100) as t0
) as t1
INNER JOIN travel-sample t2 on keys t1.keylist;

Old8088 · July 19, 2016, 12:06am

thanks kenshav_m but in my issue report i’ve already done this, see below the keyword “Workaround” in my previous post, i’ve done exactly this.

keshav_m · July 19, 2016, 12:15am

Hi @Old8088 Great. Observe the usage of array_agg() in the query, resulting in a faster KV fetch.

Old8088 · July 19, 2016, 5:58am

@kenshaw_m thank you for hint

Topic		Replies	Views
Performance Issue - Large Number of Documents Couchbase Server n1ql	2	1170	June 20, 2020
N1QL query Performance is getting low irrespective of iterations SQL++ n1ql	11	2757	January 29, 2016
Offset and Limit SQL++ query , n1ql	1	608	April 26, 2021
Issues on querying data SQL++ query , java , n1ql	4	578	October 3, 2022
N1QL query performance with LIMIT and OFFSET Couchbase Server query , n1ql	2	7639	April 30, 2020

N1QL limit & offset optimization issue

Related topics