Covering index for nested array

cmantelo · June 6, 2019, 4:08pm

Hi all,
In Couchbase v5.5, we have the following documents in our bucket test_bucket:

{
  "type": "type_1",
  "outerArray": [
     {
        "innerArray": [
             {
                 "idItem": 25,
                 "relevance": 60
             },
             {
                 "idItem": 46,
                 "relevance": 70
             }
        ]
     },
     {
        "innerArray": [
             {
                 "idItem": 10,
                 "relevance": 45
             },
             {
                 "idItem": 99,
                 "relevance": 100
             }
        ]
     }
  ]
}

Our query is this:

SELECT meta(b).id
FROM test_bucket b
WHERE ANY o IN b.outerArray SATISFIES
    ANY i IN o.innerArray SATISFIES 
       i.idItem IN [25, 46] AND i.relevance >= 50
    END
END

We tried this index that does not cover the previous query:

CREATE INDEX idx ON test_bucket(DISTINCT ARRAY (DISTINCT ARRAY [i.idItem, i.relevance] FOR i IN o.innerArray END) FOR o IN outerArray END)

Please, can you tell us how to create a covering index for this query?

chang.liu · June 6, 2019, 4:29pm

Hi @cmantelo,

CREATE INDEX idx10 ON test_bucket(DISTINCT ARRAY (DISTINCT ARRAY i.idItem FOR i IN o.innerArray WHEN i.relevance >= 50 END) FOR o IN outerArray END)

cmantelo · June 7, 2019, 8:34am

Hi @chang.liu
The query uses this new index, but this index does not cover the query.
This is the execution plan:

   {
      "plan": {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "DistinctScan",
            "scan": {
              "#operator": "IndexScan3",
              "as": "b",
              "index": "idx10",
              "index_id": "7075df3fdd801e40",
              "index_projection": {
                "primary_key": true
              },
              "keyspace": "test_bucket",
              "namespace": "default",
              "spans": [
                {
                  "range": [
                    {
                      "high": "25",
                      "inclusion": 3,
                      "low": "25"
                    }
                  ]
                },
                {
                  "range": [
                    {
                      "high": "46",
                      "inclusion": 3,
                      "low": "46"
                    }
                  ]
                }
              ],
              "using": "gsi"
            }
          },
          {
            "#operator": "Fetch",
            "as": "b",
            "keyspace": "test_bucket",
            "namespace": "default"
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Filter",
                  "condition": "any `o` in (`b`.`outerArray`) satisfies any `i` in (`o`.`innerArray`) satisfies (((`i`.`idItem`) in [25, 46]) and (50 <= (`i`.`relevance`))) end end"
                },
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "expr": "(meta(`b`).`id`)"
                    }
                  ]
                },
                {
                  "#operator": "FinalProject"
                }
              ]
            }
          }
        ]
      },
      "text": "SELECT meta(b).id\r\nFROM test_bucket b\r\nWHERE ANY o IN b.outerArray SATISFIES\r\n    ANY i IN o.innerArray SATISFIES \r\n       i.idItem IN [25, 46] AND i.relevance >= 50\r\n    END\r\nEND"
    }

Any ideas for creating a covering index?

Thanks!

chang.liu · June 7, 2019, 4:03pm

Hi @cmantelo ,
I tried this query with this index and it should be covering, could you please upgrade the server version you are using? v5.5 doesn’t support it.

vsr1 · June 8, 2019, 3:09pm

This query is little complex as you have multiple predicates on array elements as both values can be changed.
If you need to cover this best option will be include array as index key at the end of the index. https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/indexing-arrays.html (implicit covered array index is not possible, as this possible in limited cases)

cmantelo · June 10, 2019, 9:10am

Hi @chang.liu ,
We are currently running v5.5 Enterprise in our corporate production environments, with many buckets and indexes. It’s not possible for us to upgrade.

Thanks!

cmantelo · June 10, 2019, 9:21am

Hi @vsr1 ,

The documentation examples explain how to create covering index for arrays, but not for nested arrays.
We tried this index but it doesn’t cover the query:

CREATE INDEX idx ON test_bucket(DISTINCT ARRAY (DISTINCT ARRAY [i.idItem, i.relevance] FOR i IN o.innerArray END) FOR o IN outerArray END, outerArray[*].innerArray)

Any advice on how to create an index with nested array as index key?

Thanks for your help.

vsr1 · June 10, 2019, 1:13pm

As I mentioned in my previous post It is difficult create covering index for nested arrays when you have more than one level and type of queries you have.

If you have support level aggrement open support case will help through that channel.

If not post sample document, query and mention what are constant and what can vary each query. Will try if any covering index is possible.

cmantelo · June 10, 2019, 3:33pm

Hi @vsr1 ,
Ok, I will ask if we have support level agreement and try to open a case with support.

Thanks for your help!

vsr1 · June 10, 2019, 4:47pm

@cmantelo,

You can try the following

CREATE INDEX idx ON test_bucket(DISTINCT ARRAY (DISTINCT ARRAY [i.idItem, i.relevance] FOR i IN o.innerArray END) FOR o IN outerArray END) WHERE type = "type_1";

SELECT meta(b).id
FROM test_bucket b
WHERE b.type = "type_1" AND
     ANY o IN b.outerArray
     SATISFIES (ANY i IN o.innerArray
                SATISFIES ([i.idItem, i.relevance] >= [25, 50] AND [i.idItem, i.relevance] < [26]) OR
                           ([i.idItem, i.relevance] >= [46, 50] AND [i.idItem, i.relevance] < [47])
                END)
     END;

As i mentioned these are too complex and you need to aware of how to write proper query on array comparisons other wise result in wrong results.

For given index the index key is [i.idItem, i.relevance] i.e array of two elements first idItem, 2nd relevance
The query must use same combination as comparison operation to qualify index and push the predicates to indexer.
If every thing equal it is easy ( [i.idItem, i.relevance] = [ 25, 50] )non equal case predicates must be properly written.

If that doesn’t work as covering, try UNION query which represent one OR clause of satisfies.

SELECT meta(b).id
FROM test_bucket b
WHERE b.type = "type_1" AND
     ANY o IN b.outerArray
     SATISFIES (ANY i IN o.innerArray
                SATISFIES  [i.idItem, i.relevance] >= [46, 50] AND [i.idItem, i.relevance] < [47]  END)
     END
UNION
SELECT meta(b).id
FROM test_bucket b
WHERE b.type = "type_1" AND
     ANY o IN b.outerArray
     SATISFIES (ANY i IN o.innerArray
                SATISFIES [i.idItem, i.relevance] >= [25, 50] AND [i.idItem, i.relevance] < [26] END)
     END;

Other option is index single key and store array as part of index

CREATE INDEX idx ON test_bucket(DISTINCT ARRAY (DISTINCT ARRAY i.idItem FOR i IN o.innerArray END) FOR o IN outerArray END, outerArray  ) WHERE type = "type_1";
SELECT meta(b).id
FROM test_bucket b
WHERE b.type = "type_1" AND
     ANY o IN b.outerArray
     SATISFIES (ANY i IN o.innerArray
                SATISFIES i.idItem IN [25, 46] AND  i.relevance >= 50 END)
     END;

cmantelo · June 15, 2019, 4:01pm

Hi @vsr1 ,
After some testing with several queries and indexes, the index that offered the best results was this:
CREATE INDEX idx ON test_bucket(outerArray) WHERE type = "type_1"

We decided to use this index because we changed the structure of our test documents and the previous index covers our new queries.

Our new document structure is this:

{
  "type": "type_1",
  "outerArray": [
     {
        "innerArray1": [
             {
                 "idItem": 25,
                 "relevance": 60
             },
             {
                 "idItem": 46,
                 "relevance": 70
             }
        ],
        "innerArray2": [
             {
                 "idItem": 37,
                 "relevance": 10
             },
             {
                 "idItem": 225,
                 "relevance": 25
             }
        ],
        "innerArray3": [
             {
                 "idItem": 334,
                 "relevance": 80
             },
             {
                 "idItem": 6,
                 "relevance": 90
             }
        ]
     },
     {
        "innerArray1": [
             ...
        ],
        "innerArray2": [
             ...
        ],
        "innerArray3": [
             ...
        ]
     }
  ]
}

Our new queries are like this:

SELECT meta(b).id 
FROM test_bucket b 
WHERE 
  ANY o IN b.outerArray SATISFIES 
   ANY i IN o.innerArray1 SATISFIES i.idItem IN [25, 46] AND i.relevance >= 50 END
   AND ANY i IN o.innerArray2 SATISFIES i.idItem IN [50, 75] AND i.relevance < 50 END
   AND ANY i IN o.innerArray3 SATISFIES i.idItem IN [11, 32, 37] AND i.relevance >= 50 END
  END
LIMIT 49
OFFSET 3000

There are more queries, with OR clauses instead of AND clauses, so we decided to use the previous index instead of creating several different indexes because the previous index covers all our queries.

Thanks for your help!

Topic		Replies	Views
Index creation on nested structures makes the query fail SQL++ query , n1ql , index	4	726	February 10, 2021
Couchbase: Create Index on Array containing Array of objects SQL++ query , n1ql	1	729	April 19, 2022
Index is not covered when array is used in where clause - Couchbase 4.5 SQL++ index	4	903	June 30, 2019
Array Indexing in Couchbase 4.5 SQL++	1	1318	September 9, 2016
Covering Indexes in JOIN query using UNNEST SQL++	4	1236	March 22, 2019

Covering index for nested array

Related topics