First $lastType is unknown during plan phase, planner should generate such a way that it will work for all possible values.
Above predicate is complex it has constants and dynamic value as previous statement it will generate generic plan that can work all values.
First OR, Second OR is same. Because META().id is string and can’t be empty string. So (SUBSTR(META(sg).id,0) > “”) always true.
First OR, Second OR is always false because sg.type will never be “”, it is AND with (sg.type IN [“location”]) these are contradictory predicates.
So try with sg.type IN [“location”] AND sg.type > $lastType
In this case Optimizer deiced to use one of the value for index scan and apply other post indescan. Depends on what value is used how many false positives it can impact query latencies. This has been addressed by MB-52873
OR sg.type IN [“location”] AND sg.type >= GRATEST(“location”, SUCCESSOR($lastType))
Use one of the following alternatives.
Drive from application
Use $lastType = in $documentTypes and when query returns less than your LIMIT move to next Type and set all others to “”
WHERE sg.`type` = $lastType
AND ((IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") = $lastLocation AND SUBSTR(META(sg).`id`,0) > $lastKey)
OR (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") > $lastLocation))
OR
WITH newTypes AS (ARRAY v FOR v IN $documentTypes SATISFIES WHEN v > $lastType END)
SELECT sg.`type`, IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") AS `location`, meta(sg).`id`
FROM `sync_gateway_sw1` AS sg
WHERE (((sg.`type` = $lastType) AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") = $lastLocation) AND (SUBSTR(META(sg).`id`,0) > $lastKey))
OR ((sg.`type` = $lastType) AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") > $lastLocation))
OR sg.`type` IN newTypes)
AND (NOT IFMISSINGORNULL(sg.`softDelete`, FALSE))
AND (NOT IFMISSINGORNULL(sg.`_deleted`, FALSE))
AND (META(sg).`id` NOT LIKE "_sync:%")
ORDER BY sg.`type`, IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff"), SUBSTR(META(sg).`id`,0)
LIMIT 100;
OR
Application can remove the type processed from $documentTypes (i.e. when lastType changed remove from the named parameter) and use following
SELECT sg.`type`, IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") AS `location`, meta(sg).`id`
FROM `sync_gateway_sw1` AS sg
WHERE (((sg.`type` = $lastType) AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") = $lastLocation) AND (SUBSTR(META(sg).`id`,0) > $lastKey))
OR ((sg.`type` = $lastType) AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") > $lastLocation))
OR sg.`type` IN $documentTypes)
AND (NOT IFMISSINGORNULL(sg.`softDelete`, FALSE))
AND (NOT IFMISSINGORNULL(sg.`_deleted`, FALSE))
AND (META(sg).`id` NOT LIKE "_sync:%")
ORDER BY sg.`type`, IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff"), SUBSTR(META(sg).`id`,0)
LIMIT 100;
Is it possible to say exactly what is happening in Couchbase (perhaps in the Index server) during this 25-second period? Is it performing a full index scan?
Why don’t I see this 25-second delay in my second iteration onwards? The query doesn’t change in the second iteration, therefore the query plan should be the same. The second iteration returns in less than 100ms.
Thanks for workaround, but I’m not sure if I can apply that in all our scenarios.
Above this clearly explains, when used named parameter used it starts with named parameter(“”) and stops at location. I.e. it means index scans doing thousands and millions of thems vs start the grater of $lastType, “location” due to multiple ANDs of same predicate.
You should have known that when you look profile timings and look #itemsOut/#itemsIn
When query is parameterized lower range is set to “low”: “$lastType” by the Query planner and the index scanner starts the scan from empty string “”, which means it scans every document in front of location type to reach the first location document in the index.
But when the query is un-parameterized lower range is set as “low”: “"location"”, therefore index scanner directly seeks to the first location document.
Just curious – can’t the Query Planner be a bit smarter when parameterized? Since my parameterized query has sg.type IN [“location”] , can’t it set “low”: “location” ?
WHERE (sg.`type` IN ["location"])
AND (((sg.`type` = "") AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") = "") AND (SUBSTR(META(sg).`id`,0) > ""))
OR ((sg.`type` = "") AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") > ""))
OR (sg.`type` > $lastType))
This has been addressed in future release by MB-52873
As i mentioned remove use the work around.
try this see it uses “location”
WHERE (((sg.`type` = "") AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") = "") AND (SUBSTR(META(sg).`id`,0) > ""))
OR ((sg.`type` = "") AND (IFMISSINGORNULL(sg.`locationRoot`, sg.`locationId`, sg.`location`, "\uefff") > ""))
OR (sg.`type` > $lastType))
AND (sg.`type` IN ["location"])