Array index never used even when forced

dagneytaggartsv · March 15, 2017, 11:54am

I have a document that looks like this:


{
"chr":"something",
"start":1,
"annot": { "ct":["name":"ct1","so":[123,345],  "name":"ct2", "so":[456,567] ] }
}

I created an index on the “so” array that you see above thusly:


CREATE INDEX iannot_so_sample ON docsample (annot, annot.ct, annot.ct.so, DISTINCT ARRAY (DISTINCT ARRAY obj1 FOR obj1 IN obj.so END) FOR obj in annot.ct END);

However I can never see IndexScan when I try to query the elements of the “so” array. For example, the following explain statement always says it is doing “PrimaryScan”


explain select c.so from docsample v use index(iannot_so_sample) unnest v.annot as a unnest a.ct as c where ANY s in c.so satisfies s = 1627 END;

Is this a bug? I am using Couchbase 4.5 CE btw.

atom_yang · March 15, 2017, 1:46pm

it is not a valid json document.

and you should use the same variable name in the N1QL with INDEX definition.

for your example,try this

explain select obj.so from docsample v use index(iannot_so_sample) unnest v.annot as a unnest a.ct as obj where ANY obj1 in obj.so satisfies obj1 = 1627 END;

it will looks like as following

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "DistinctScan",
        "scan": {
          "#operator": "IndexScan",
          "covers": [
            "cover ((`v`.`annot`))",
            "cover (((`v`.`annot`).`ct`))",
            "cover ((((`v`.`annot`).`ct`).`so`))",
            "cover ((distinct (array (distinct (array `obj1` for `obj1` in (`obj`.`so`) end)) for `obj` in ((`v`.`annot`).`ct`) end)))",
            "cover ((meta(`v`).`id`))"
          ],
          "index": "iannot_so_sample",
          "index_id": "bdbe5701bd7b7266",
          "keyspace": "default",
          "namespace": "default",
          "spans": [
            {
              "Exact": true,
              "Range": {
                "High": [
                  "{}"
                ],
                "Inclusion": 1,
                "Low": [
                  "[]"
                ]
              }
            }
          ],
          "using": "gsi"
        }
      },
      {
        "#operator": "Parallel",
        "~child": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "Unnest",
              "as": "a",
              "expr": "cover ((`v`.`annot`))"
            }
          ]
        }
      },
      {
        "#operator": "Parallel",
        "~child": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "Unnest",
              "as": "obj",
              "expr": "(`a`.`ct`)"
            }
          ]
        }
      },
      {
        "#operator": "Parallel",
        "~child": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "Filter",
              "condition": "any `obj1` in (`obj`.`so`) satisfies (`obj1` = 1627) end"
            },
            {
              "#operator": "InitialProject",
              "result_terms": [
                {
                  "expr": "(`obj`.`so`)"
                }
              ]
            },
            {
              "#operator": "FinalProject"
            }
          ]
        }
      }
    ]
  },
  "text": "select obj.so from default v use index(iannot_so_sample) unnest v.annot as a unnest a.ct as obj where ANY obj1 in obj.so satisfies obj1 = 1627 END;"
}

dagneytaggartsv · March 15, 2017, 4:26pm

There was a minor typo in the document. The corrected version is below. Regardless, the suggested query still doesn’t make use of the index. Also, my understanding was that “placeholder aliases” (such as obj and obj1) don’t matter and don’t need to be consistent across the index definition and the actual query. If not, are you saying that anyone querying this document must be aware of the exact placeholder alias that was used in the index definition? That doesn’t look like a sound design decision to me…


{"chr":"something","start":1,"annot": { "ct":[{"name":"ct1","so":[123,345]}, {"name":"ct2", "so":[456,567]} ] }}

The “EXPLAIN” statement that you suggested gave me this output:


{
 "requestID": "ff0b50df-6331-437c-b3d3-3e16862f3cf0",
 "signature": "json",
 "results": [
 {
 "plan": {
 "#operator": "Sequence",
 "~children": [
 {
 "#operator": "PrimaryScan",
 "index": "#primary",
 "keyspace": "docsample",
 "namespace": "default",
 "using": "gsi"
 },
...
...

May be this is a bug. Steps to reproduce below:

./couchbase-cli bucket-create --bucket=onedoc_bucket --bucket-ramsize=100 --bucket-replica=0 -c localhost:8091 -u [username] -p [password]
insert into onedoc_bucket (KEY, VALUE) values (“k001”,{“chr”:“something”,“start”:1,“annot”: { “ct”:[{“name”:“ct1”,“so”:[123,345]}, {“name”:“ct2”, “so”:[456,567]} ] }});
create primary index on onedoc_bucket;
CREATE INDEX iannot_so_exp ON onedoc_bucket (annot, annot.ct, annot.ct.so, DISTINCT ARRAY (DISTINCT ARRAY obj1 FOR obj1 IN obj.so END) FOR obj in annot.ct END);
explain select obj.so from onedoc_bucket v use index(iannot_so_exp) unnest v.annot as a unnest a.ct as obj where ANY obj1 in obj.so satisfies obj1 = 1627 END;

atom_yang · March 15, 2017, 7:21pm

“placeholder aliases” do matter for array index.

The variables used in query predicates ( where-clause) of subsequent SELECT/UPDATE/UPSERT/DELETE statements must be the same as those used in the above array_expression. FYI .

annot is a object ,not a array, so you can not use UNNEST. FYI.

so you should use this N1QL

select obj.so from onedoc_bucket v  unnest v.annot.ct AS obj where ANY obj1 in obj.so satisfies obj1 = 123 END;

with this index

 CREATE INDEX iannot_so_exp ON onedoc_bucket (annot, annot.ct, DISTINCT ARRAY (DISTINCT ARRAY obj1 FOR obj1 IN obj.so END) FOR obj in annot.ct END);

dagneytaggartsv · March 16, 2017, 9:18am

Created the index as you had suggested. Explain statement on the N1QL that you had suggested still gives:


{
 "requestID": "f76cfefa-7214-47c8-a421-14c6832ab769",
 "signature": "json",
 "results": [
 {
 "plan": {
 "#operator": "Sequence",
 "~children": [
 {
 "#operator": "PrimaryScan",
 "index": "#primary",
 "keyspace": "onedoc_bucket",
 "namespace": "default",
 "using": "gsi"
 },
....
....
....

atom_yang · March 16, 2017, 9:43am

I am using CB 4.6 DP, and it works.
cc @geraldss @prasad

prasad · March 17, 2017, 10:16am

Hi @atom_yang, the query might work and give results. Can you check if EXPLAIN of your query is using the index.

Hi @dagneytaggartsv,
try using following index and query:

CREATE INDEX iannot_so_exp1 ON default (DISTINCT ARRAY 
                          (DISTINCT ARRAY obj1 FOR obj1 IN obj.so END) 
                          FOR obj in annot.ct END,  annot.ct);

explain select obj.so from default v  
unnest v.annot.ct AS obj 
unnest obj.so as obj1 
where  obj1 = 123 ;

…
“#operator”: “DistinctScan”,
“scan”: {
“#operator”: “IndexScan”,
“covers”: [
“cover ((distinct (array (distinct (array obj1 for obj1 in (obj.so) end)) for obj in ((v.annot).ct) end)))”,
“cover (((v.annot).ct))”,
“cover ((v.annot))”,
“cover ((meta(v).id))”
],
“index”: “iannot_so_exp1”,
“index_id”: “27c322d9131d40ce”,
“keyspace”: “default”,

Basically, it fixes two issues in your queries:

To qualify an index for a query, it is necessary for the query/SELECT to have a where clause predicates that include the various index-keys in the order they are defined in CREATE INDEX. This is a generic index keys prefix-matching requirement for any composite indexes that use b-tress (including all traditional databases). You will see more details in docs, but for now, you can refer to some detail I provided in Covering index range spans - #3 by prasad

In your query, the predicate does not use the index keys annot and annot.ct in where-clause predicates. In the above index def, I changed the order of index keys (note, I also remove ‘annot’ as that isn’t needed to cover this query). To use your index, the query should have additional predicates using ‘annot’ and ‘annot.ct’, something like:
'annot' IS NOT MISSING and 'annot.ct' IS NOT MISSING

your SELECT query predicate should use nested UNNESTs or nested ANY constructs in the where clause to use the nested array index. AFAIK, you can’t use UNNEST in one level, and ANY in another level. For ex: following query with nested ANYs would also work fine:
explain select meta(v).id from default v
where ANY obj in v.annot.ct satisfies
(ANY obj1 in obj.so satisfies obj1 = 123 END) END;
…
“#operator”: “IndexScan”,
“covers”: [
“cover ((distinct (array (distinct (array obj1 for obj1 in (obj.so) end)) for obj in ((v.annot).ct) end)))”,
“cover (((v.annot).ct))”,
“cover ((v.annot))”,
“cover ((meta(v).id))”
],
“index”: “iannot_so_exp1”,
“index_id”: “27c322d9131d40ce”,
“keyspace”: “default”,
hth,
-Prasad

dagneytaggartsv · March 17, 2017, 1:13pm

Prasad,

Thanks for the detailed explanation. Your suggestion worked! Appreciate your help!

Topic		Replies	Views
Array index not working SQL++	5	716	February 14, 2020
How to properly index array field? Couchbase Server n1ql , index	3	1904	June 21, 2017
Problem with Array Indexing SQL++	6	2243	April 9, 2016
Array Covering index with UNNEST and condition SQL++ n1ql , index	30	4058	April 16, 2018
Query analyzer doesn't use index specified by "use index" statement SQL++	7	1375	June 12, 2021

Array index never used even when forced

Related topics