Covering array index not avoiding FETCH

Hello,

Need some help to understand why is the index NOT a covered index for the query.

Couchbase server version is 6.0 & 6.6. codes is an array.

Index: CREATE INDEX idxccndc ON config((distinct (array ((v.cc) || (v.ndc)) for v in codes when ((v.primary) = 1) end))) WHERE (__t = “irdb-net”);

Query:
SELECT count(*)
FROM config x
WHERE
ANY v in codes satisfies position(“30697111”,v.cc||v.ndc) = 0
AND v.primary = 1 END
AND x.__t = “irdb-net”;

We tried various combinations mentioned below but could not do away with the FETCH.
(1) created 3 array indices one with each element as the key
CREATE INDEX idxpri ON config((distinct (array (v.primary)) for v in codes when ((v.primary) = 1) end))) WHERE (__t = “irdb-net”);
CREATE INDEX idxcc ON config((distinct (array (v.cc)) for v in codes when ((v.primary) = 1) end))) WHERE (__t = “irdb-net”);
CREATE INDEX idxndc ON config((distinct (array (v.ndc)) for v in codes when ((v.primary) = 1) end))) WHERE (__t = “irdb-net”);

(2) created a single array index with all 3 elements
CREATE INDEX idxpriccndc ON config(distinct array [v.primary,v.cc,v.ndc] for v in codes when (v.primary) = 1 end) WHERE (__t = “irdb-net”);

(3) created a single array index with all 3 elements
CREATE INDEX
idxpriccndc ON
config(distinct array [(v.primary),(v.cc),(v.ndc)] for v in codes when (v.primary) = 1 end)
WHERE (__t = “irdb-net”);

Thanks

If all you’re looking for is to cover this particular COUNT, then:

CREATE INDEX ix_eg1 on `config`(`__t`) FOR v IN `codes` END) WHERE ANY v IN `codes` SATISFIES
 position("30697111",v.`cc`||v.`ndc`) = 0 AND v.`primary` = 1 END AND `__t` = "irdb-net";

would do as all the query filters exactly match the index filters.

@vsr1 will probably have a more useful suggestion for you.

In version 7.1 the FLATTEN_KEYS (https://issues.couchbase.com/browse/MB-32506) enhancement achieves a more useful index:

CREATE INDEX ix_eg2 ON default(DISTINCT ARRAY FLATTEN_KEYS(v.`cc`,v.`ndc`) FOR v IN `codes` WHEN v.`primary` = 1 END) WHERE `__t` = "irdb-net";

that covers ‘cc’ and ‘ndc’ where ‘primary’ is 1 directly (and thus covers this query and others with similar filters).

CREATE INDEX ix_eg3 ON default(DISTINCT ARRAY FLATTEN_KEYS(v.`cc`,v.`ndc`,v.`primary`) FOR v IN `codes` END) WHERE `__t` = "irdb-net";

Is another less restrictive 7.1+ alternative.

HTH.

Thanks for your response @dh but that string (“30697111”) will be the input to the n1ql. So it cannot be included in the index.

You can’t probe a constant with an index scan - that is, if you have to match the field concatenation against the string (as opposed to a string against the concatentation) it can’t be pushed down so ends up requiring the fetch (there would be no advantage in the index structure as all keys would have to be checked).

(Your condition has cc = “” and ncc = “” as a match; if they’re actually expected to match the entire constant then an equality or LIKE predicate with indexing the concatenation would work.)

Unfortunately in 6.x you can’t index multiple nested fields and all your attempts result in objects that have to be matched in their entirety and are not short-cuts to indexing nested individual fields.

The 7.1 flatten keys enhancement addresses this sort of usage by allowing multiple individual nested fields to be indexed as separate keys.

Thanks @dh

We have tried LIKE but that too does not work. And I guess that is because, in our case, the string is always longer than the concatenation, so we cannot append the % to the string; instead we have to append % to the field concatenation and hence the fields are on the RHS. e.g.
“30697111” LIKE v.cc||v.ndc||‘%’

Basically, we know that there will only be 1 document if at all. So, we are actually ok with the FETCH as far as indexscan returns only that 1 record. But somehow that is not happening for e.g. we tried below query & index, and ItemsOut is same as all items in the index as a result of which the number of documents being fetched from Data service are high and hence the filter takes long:
ANY v IN codes SATISFIES (v.primary) = 1
AND v.cc = SUBSTR(‘3069711’,0,length(v.cc))
AND v.ndc = SUBSTR(‘3069711’,length(v.cc),length(v.ndc))
END

CREATE INDEX idxpri ON config((distinct (array (v.primary) for v in codes when ((v.primary) = 1) end))) WHERE (__t = “irdb-net”)

CREATE INDEX idxcc ON config((distinct (array (v.cc) for v in codes when ((v.primary) = 1) end))) WHERE (__t = “irdb-net”)

CREATE INDEX idxndc ON config((distinct (array (v.ndc) for v in codes when ((v.primary) = 1) end))) WHERE (__t = “irdb-net”)

When there are only 5 cc values that satisfy the condition, why does indexscan still return all 1055 IDs from the index? Similarly, when there are only 7 ndc values that satisfy the condition, why does indexscan still return all IDs from the index?

ANY clause can cover only when all the predicates are pushed to indexer and no false positives.

See if the following covers.

CREATE INDEX ix1 ON config((ALL ARRAY code.cc || code.ndc FOR code IN codes WHEN code.primary = 1 END) WHERE __t = "irdb-net";

SELECT COUNT(DISTINCT META(x).id)
FROM config x
UNNEST x.codes AS code
WHERE POSITION("30697111", code.cc||code.ndc) = 0
      AND code.primary = 1
      AND x.__t = "irdb-net";

OR

SELECT COUNT(1)
FROM (SELECT DISTINCT RAW META(x).id
      FROM config x
      UNNEST x.codes AS code
      WHERE POSITION("30697111", code.cc||code.ndc) = 0
            AND code.primary = 1
            AND x.__t = "irdb-net") AS x1;

@pccb, not that it matters now that @vsr1 has chimed in with something that works (UNNEST), but the LIKE would have to be the other way around - that’s what I was trying to note. If you must find the concatenation in the constant (const LIKE field||field) rather than the constant in the concatenation (field||field LIKE const) you can’t use ANY to avoid the fetch.

thanks a ton @vsr1 . Its super! The FETCH is gone!!

I see 2 differences:
(1) ANY to UNNEST - what is the difference?
(2) DISTINCT vs ALL - why did that matter?

May be that first sentence in your last response explains, but I didn’t quite get it. So would really appreciate if you can help understand. Thanks again! :smile:

UNNEST is self JOIN with ARRAY elements.
ANY is, any one in the document.

Array Index leading ARRAY key can use IndexScan for UNNEST .
To Cover IndexScan for UNNEST you need ALL index because (DISTINCT index eliminates duplicates with in array per document bases (too save space) vs ALL keep duplicates) self join can’t eliminate duplicates.

As you doing Aggregate we use ALL index and do DISTINCT META().id so that it is equivalent of the ANY clause can expose covering.

INSERT INTO default VALUES ("k01",{"arr":[{"a":"abc"}, {"a":"abc"}, {"a":"xyz"}]});

CREATE INDEX ix1 ON default (DISTINCT ARRAY v.a FOR v IN arr END);

If you check index it has 2 entries as it eliminated duplicates with in ARRAY due to DISTINCT index

  cbindex -auth user:password -type=scanAll -bucket default -index ix1
         ScanAll index:
         ["abc"] ... k01
         ["xyz"] ... k01
         Total number of entries:  2

CREATE INDEX ix2 ON default (ALL ARRAY v.a FOR v IN arr END);

If you check index it has 3 entries as it keep duplicates with in ARRAY due to ALL index

   cbindex -auth user:password -type=scanAll -bucket default -index ix2
         ScanAll index:
         ["abc"] ... k01
         ["abc"] ... k01
         ["xyz"] ... k01
         Total number of entries:  3
SELECT COUNT(1)
FROM default AS d
WHERE ANY v IN d.arr SATISFIES v.a LIKE "%b%" END;

Above query EXPLAIN shows this. The predicates not pushed to indexer and false positives possible and query need to reapply predicate. As index doesn’t have d.arr and predicate referred it can’t reconstruct from index entries. So required Fetch.

          "index": "ix1",
          "spans": [ {
                                "range": [
                                    {
                                        "high": "[]",
                                        "inclusion": 1,
                                        "low": "\"\""
                                    }
                                ]
                            }
                        ]
SELECT COUNT( DISTINCT MEA(d).id)
FROM default AS d
UNNEST d.arr AS v
WHERE v.a LIKE "%b%";

As this Self JOIN and default d, UNNEST d.arr AS v together used Unnest IndexScan. Except UNNEST d.arr it never used array any where all predicates and projection can be used single row from index It can use covering.
If it is DISTINCT index it lost rows (UNNEST needs to preserve duplicates) it must do Fetch.

                  "index": "ix2",
                    "spans": [
                        {
                            "range": [
                                {
                                    "high": "[]",
                                    "inclusion": 1,
                                    "low": "\"\""
                                }
                            ]
                        }
                    ],