IntersectScan must do Fetch. And it re-applies the predicates. So the output of IntersectScan can have false positives. #itemsIn in IntersectScan is sum of the all the input sources.
N1QL optimizer uses rule based. Don’t know anything about how many items each input produces.
Due to that IntersectScan on the N1QL is little different than pure Intersect.
When ever any input is done it uses all items received from that input as possible candidate and terminate other inputs early vs wait for all the inputs do pure intersect. Some times this performs well (input produces 10, input 2 produces 1M - in this case it terminates input 2 early) and some times not ( input 1 produces 50,000 input 2 produces 40,000 - intersect is only 100).
If really need pure Intersect use SET INTERSECT ( Q1 INTERSECT Q2) for each query as covered and projecting document ids as USE KEYS.
If intersectScan performing poorly avoid it by using USE INDEX.
My understanding is that query node will make data requests from the indexer nodes for all qualifying indexes in an intersect scan, but will only use the first returning result set.
So these requests are done in parallel. (Correct?) My question is about load: even though the query node ignores all but the first returning results, the index node(s) either have to do all the work or else they have to do as much of the work as they get done before being told to stop if they are being told to stop.
How does this work (does the index node(s) do all the scan work, or are they told to stop?)
Are these scans done in parallel in a one index node deployment by default or not (version 6.0)
In the explain underneath IntersectScan there is IndexScan. All Index scans are done in parallel irrespective of indexer nodes. As explained previous post N1QL uses early termination (vs pure intersect) i.e when one indexscan completes it terminates all other indexscans under intersectscan and uses all the results from the indexscan as possible candidates and does fetch .
Some times this works better if one of the scan completes quickly. If not there might be lot work might be wasted. Avoid intersectScan specifying single index in USE INDEX hint.
Right, I’m trying to understand more about " it terminates all other indexscans under intersectscan"
If I’m understanding correctly, it’s possible that for a query requiring an intersectScan over, say, 3 secondary indexes, the indexer node(s) end up doing 3 times the work, depending on how early or late into the respective scan operations, they get terminated, and the more distributed the index nodes, the more likely that they do more work for a intersectScan.
If so this give us an additional incentive (in addition to unpredictable query response time) to avoid intersectScan.
It is possible. Some times it is much worse because it needs to fetch the documents.
As rule based decision can’t decide which index is better it does InterSectScan.
Based on data/patterns do experiment with one by one index for all index(s) in intersectscan and select best one, provide as USE INDEX hint.