Function ARRAY's errata

So my understanding is:
ARRAY variable.fields FOR variable IN arrayable-expression creates an array that might as well look like this:
[this, that, theother]

Unless you’re creating an index, then it creates a key for each member of the array.

Also, you can use the variable outside that expression, it’s declared by that expression.

Am I correct?

Range transforms (ARRAY, FIRST, OBJECT) allow you to map and filter the elements or attributes of a collection or object(s). ARRAY evaluates to an array of the operand expression, while FIRST evaluates to a single element based on the operand expression. OBJECT evaluates to an object whose name : value attributes are name-expr : expr .

Name-expr must evaluate to a string. If not, that attribute is omitted from the result object.

https://docs.couchbase.com/server/6.0/n1ql/n1ql-language-reference/collectionops.html

ARRAY expr FOR variable IN arrayable-expression WHEN cond END creates an array that might as well look like this: [this, that, theother]

variable can be used in the expr or cond
depends on arrayable-expression the variable can be value or object or anther array.

Same is true in create index also. In create index ALL|DISTINCT in front of ARRAY keyword each element of ARRAY will make different index row which points to same document.

https://docs.couchbase.com/server/6.0/n1ql/n1ql-language-reference/indexing-arrays.html

Example:

SELECT ARRAY v.name FOR v IN aa WHEN v.id >= 2 END AS a1, ARRAY v FOR v IN aa WHEN v.id = 2 END AS a2
LET aa = [{"name":"id1", "id":1}, {"name":"id2", "id":2}, {"name":"id3", "id":3}];

"results": [
    {
        "a1": [
            "id2",
            "id3"
        ],
        "a2": [
            {
                "id": 2,
                "name": "id2"
            }
        ]
    }
    ]

Right thanks, sorry for wasting your time. I’m aware of all that. What I’m talking about is more language design.

The behavior of the “ARRAY” function seems to be different when in a CREATE INDEX scope since it doesn’t evaluate to an object, but rather creates an an index key for every array member.

That is only when DISTINCT ARRAY or ALL ARRAY

Assume you have following document {“name”:“xyz”,“a1”:[1,1,2,3])

CREATE INDEX ix1 ON default (name, a1) ;
        Here second key is array scalar. 
       index will have one entry of  "xyz", [1,1,2,3]

CREATE INDEX ix2 ON default (name, ARRAY v FOR v IN a1 WHEN v >=2 END) ;
        Here second key is array scalar. 
       index will have one entry of  "xyz", [2,3]

CREATE INDEX ix3 ON default(name,  DISTINCT ARRAY v FOR v IN a1END) ;
        index will have  3 entries of
                  "xyz",  1
                  "xyz",  2
                  "xyz",  3

CREATE INDEX ix3 ON default(name,  ALL ARRAY v FOR v IN a1END) ;
        index will have  4 entries of
                  "xyz",  1
                  "xyz",  1
                  "xyz",  2
                  "xyz",  3
1 Like

That’s interesting and a bit un-intuitive, but it seems impossible to retrieve the results of an individual index entry even if you use the DISTINCT or ARRAY keyword. If I create an example like yours, and I want to GROUP BY individual array values, it seems like I can’t actually do that?

I think I can still use the v outside of the ARRAY expression though?

CREATE INDEX ix4 ON default(name, DISTINCT ARRAY lower(v) FOR v IN a1 END), v);

Can I have covering if I query for

SELECT `name`, `v` WHERE `name` IS NOT MISSING AND (..array_ish.. lower(`v`) = "whatever");

and I have aggregate via index for

SELECT count(`name`),`v` WHERE name IS NOT MISSING GROUP BY lower(`v`);

?

Once you have DISTINCT ARRAY or ALL ARRAY in index key that index key can be used with ANY clause in query predicate to pass value to indexer and use as start/stop keys. Very few cases Array index key can be used for covering. All other cases document fetch is required . You should check examples https://docs.couchbase.com/server/6.0/n1ql/n1ql-language-reference/indexing-arrays.html

In your ix4, 3rd key is v which not array value because it outside they ARRAY binding, it will treat as field v in the document.

Both queries does not have fro clause and also you did not specify what is v.

CREATE INDEX ix4 ON default(name, DISTINCT ARRAY lower(v) FOR v IN a1 END));

SELECT META(d).id 
FROM default  AS d 
WHERE d.name = "xyz" 
                   AND ANY v IN d.a1  SATISFIES lower(v) =  "whatever" END;

In this case both “xyz” and “whatever” is passed to indexer and does efficient index scan.

I would recommend do EXPLAIN and look the spans how the predicates transformed to indexscan start/stop values.

https://blog.couchbase.com/n1ql-practical-guide-second-edition/

Thanks, yeah this is the news I didn’t want. We’re moving array from using indexes on arrays because even if I fix the errors you suggested, I still can’t have a covering index with keys (lower(v), v) and I still can’t group by lower(v).

what is exact query you had may be i can suggest index.

CREATE INDEX ix1 ON default( ALL ARRAY lower(v) FOR v IN a1 END, name);

SELECT count(d.`name`),  lower(`v`) 
FROM default AS d
UNNEST d.a1 AS v 
 WHERE  lower(v) IS NOT MISSING 
                    AND d.name IS NOT MISSING
GROUP BY lower(`v`);
  1. Array index key must use ALL and must be leading index key so that UNNEST can use Arrary index scan.
  2. UNNEST alias must match with Array binding variable
  3. must have predicate on leading index key

Checkout examples in https://docs.couchbase.com/server/6.0/n1ql/n1ql-language-reference/indexing-arrays.html

query uses grouping or aggregates, the projection must depends on group keys or aggregates. it should not depend on non-group expression outside aggregates.

1 Like
// Original data model
{
  "type": "tag",
  "key1": "value1",
  "key2": "value2",
  "tags": ["tag1", "tag2", "tag3"]
}

CREATE INDEX `tag_index` ON `BUCKET`(`key1`,`key2`,(distinct (array lower(`t`) for `t` in `tags` end)),`tags`)
// Would really like to have covering w/o embedding whole array :-(

// the goal is to get all unique tags grouped by key1 or key2

// But I've moved to a model that is like:
{
  "type": "tag",
  "key1": "value1",
  "key2": "value2",
  "tag": "tag"
}
// then:
SELECT count(tag) as N, tag FROM `webcms` USE INDEX (`tag_index`) WHERE `key1` = "value1" GROUP BY tag;`
// Index:
//  Indexing + aggregates is easier, I can use keys:
(key1, key2, lower(tag), tag) 
// and it is covering.

{
“type”: “tag”,
“key1”: “value1”,
“key2”: “value2”,
“tags”: [“tag1”, “tag2”, “tag3”]
}

CREATE INDEX `tag_index` ON `webcms ` (ALL distinct (array lower(`t`) for `t` in `tags` end, key1);
SELECT  lower(t) AS tag,  COUNT(1) AS N
FROM webcms AS d
UNNEST d.tags AS t
WHERE lower(t) IS NOT NULL AND d.key1 = "value1"
GROUP BY lower(t);

Try with 6.0.1 it is more efficient because key1 value passed to indexer. Pre 6.0.1 for Unnest only leading key is passed to indexer.

thanks for your help. my todo list is exploding :sob: so ill review these and let you how it goes later.

there’s no way to grab the lower(t) and the not-lower(t) in a covering index is there?

You can. But GROUP BY doesn’t allow you that. That is nothing to do with N1QL. it is SQL standard or mahmatically it is not possible

CREATE INDEX `tag_index1` ON `webcms ` (ALL distinct (array t for `t` in `tags` end, key1);
SELECT  lower(t) AS tag,  COUNT(1) AS N, ARRAY_AGG(t) AS tags
FROM webcms AS d
UNNEST d.tags AS t
WHERE t IS NOT NULL AND d.key1 = "value1"
GROUP BY lower(t);

This type of thing useful only when you filter on array otherwise non array index is better

1 Like

I’ll have to really learn how to use EXPLAIN to understand what these are doing…

thanks again, vsr1

https://blog.couchbase.com/n1ql-practical-guide-second-edition/

There is section understanding IndexScans which has 30+ examples. After few you will get know quickly.

All N1QL blogs https://blog.couchbase.com/author/keshav-murthy/