Index Scan of Documents with Missing Attributes

polfer · April 28, 2021, 7:27pm

Assume you have an index like this:

CREATE INDEX `type_idx` ON `shadows`(`type`,`subtype`,`org`)

The type is always present and valued, but subtype and org may not be. In fact, subtype may be missing when org is valued.

With a query like this:

  SELECT META(s).`id`
    FROM `shadows` AS s
   WHERE (s.`type` = "sample-type")
     AND (s.`subtype` IS MISSING)
     AND (s.`org` = "03387460-a1b2-42dd-8325-65b2ba6bde3f")
   LIMIT 1000
  OFFSET 0

You’ll get what looks, to me, like a reasonable range scan (in fact, a pretty bounded one):

  "spans": [
    {
      "exact": true,
      "range": [
        {
          "high": "\"sample-type\"",
          "inclusion": 3,
          "low": "\"sample-type\""
        },
        {
          "high": "null",
          "inclusion": 0
        },
        {
          "high": "\"03387460-a1b2-42dd-8325-65b2ba6bde3f\"",
          "inclusion": 3,
          "low": "\"03387460-a1b2-42dd-8325-65b2ba6bde3f\""
        }
      ]
    }
  ]

My assumption - perhaps bad - from this EXPLAIN is that MISSING gets indexed and sorted before “null” in the index (hence the high of null with inclusion 0). My understanding from other posts is also that MISSING does get indexed if the attribute is not the first attribute in the index. In other words, type has to be valued but subtype and org do not have to be. Assuming all this is true, then the above range scan should be in good form and be quick.

When run against a relatively large index, however, we see this run very slowly. In fact, it is behaving exactly as though I’m getting a FULL scan of every row of type “sample-type” - like there is no cutting down to the org. Can someone help me understand why the above might yield a really big range scan?

Thank you!

vsr1 · April 28, 2021, 7:43pm

Your assumption on first paragraph is right.

Indexing experts able to answer the second paragraph. cc @jeelan.poola

polfer · May 4, 2021, 1:35pm

Thanks for the confirmation on assumptions! Still hoping there’s some insight on the slowdown from @jeelan.poola if they have a moment.

Topic		Replies	Views
Counting/Grouping Based on Var in Array Expression (View like Grouping)? SQL++ array-query	2	1549	August 26, 2019
N1QL index without where clause SQL++	3	995	March 6, 2019
Interesting Index Behavior SQL++	3	467	July 4, 2022
IS MISSING Clause not getting picked up in indexing SQL++ n1ql , index	2	2685	June 9, 2020
Secondary index entries for docs with missing key values Couchbase Server query , n1ql	10	4873	July 19, 2016

Index Scan of Documents with Missing Attributes

Related topics