Counting/Grouping Based on Var in Array Expression (View like Grouping)?

polfer · August 26, 2019, 2:36pm

Hello all!

I’ve got some documents that have one more attributes that may be null or missing along with some consistently present data. Something like:

test:1
{
  "type": "test",
  "u": "36b6a406",
  "a": "w",
  "b": "x"
}

test:2
{
  "type": "test",
  "u": "a38c849a",
  "b": "x",
  "c": "y",
  "d": null
}

test:3
{
  "type": "test",
  "u": "7f8629a9",
  "c": "y",
  "d": "z"
}

These “test” documents each have four attributes that may be missing or null (technically there will always be at least one valued). These include “a”, “b”, “c”, and “d”. There are also attributes like “type” or u" that will always be present.

An array index allows me to search for documents on “a”, “b”, “c” or “d”, and then I want to count the hits against the matching values. The index is setup like this:

CREATE INDEX `my_test_idx`
    ON `test_bucket`(DISTINCT ARRAY v FOR v
                     IN [`a`, `b`, `c`, `d`]
                     WHEN v IS VALUED END)
 WHERE (`type` = 'test')

This allows me to do queries like this:

SELECT RAW meta().id
  FROM `test_bucket`
 WHERE (ANY v IN [`a`, `b`, `c`, `d`] SATISFIES v IN ["x", "z"] END)
   AND (`type` = "test")

This finds all the test documents where “x” or “z” are in one of those four attributes. That query yields the expected:

[ "test:1", "test:2", "test:3" ]

This is great, but what I would like to do ultimately, is those counts by value.
Something like this:

SELECT COUNT(v), v
  FROM `test_bucket`
 WHERE (ANY v IN [`a`, `b`, `c`, `d`] SATISFIES v IN ["x", "z"] END)
   AND (`type` = "test")
 GROUP BY v

I want to see the count of docs that matched “x” or the count of docs that matched “z”. So far I can’t recover the v that ends up powering the matches and would allow counting.

To give some further context, with an older school view, you could accomplish the above by emit’ing each of the non-null attributes as the key, and then just use a built in _count to handle a reduction. For example:

function (doc, meta) {
  if (doc.type === 'test') {
    if (doc.a) emit(doc.a, null);
    if (doc.b) emit(doc.b, null);
    if (doc.c) emit(doc.c, null);
    if (doc.d) emit(doc.d, null);
  }
}

That view with a query on the keys [“x”,“z”] yields the same documents, but by setting group=true (again with a _count reduction) you get back:

{
  rows: [
    { key: "x", value: 2 },
    { key: "z", value: 1 }
  ]
}

Which is the ultimate target I’m after in the N1QL…

Are there any suggested approaches for something elegant here? I’m sure I’m missing something blatantly obvious. Note in the real case there are more like 7 attributes, and simple brute force approaches get messed up when you start considered searches that would hit on multiple attribute values in a single document (e.g. searching for “y”, “z” above should yield counts of 2 and 1 respectively - i.e. test:3 gets counted twice). Again, that works just fine with the view but seems harder in the N1QL.

Any and all help would be appreciated. Thank you!

vsr1 · August 26, 2019, 5:13pm

USE ALL ARRAY index (for covering). Use Same same index binding variable (i.e v) for UNNEST alias.
Also when field is (a) MISSING if used in array ([ a] ==> [null] instead of [MISSING]) it becomes NULL because array can’t have MISSINGs due position of array is important.

CREATE INDEX `my_test_idx`
    ON `test_bucket`(ALL ARRAY v FOR v
                     IN [`a`, `b`, `c`, `d`]
                     WHEN v IS VALUED END)
 WHERE (`type` = 'test');

SELECT COUNT(1) AS value, v AS key
  FROM `test_bucket` AS t
UNNEST [t.`a`, t.`b`, t.`c`, t.`d`] AS v
WHERE   v IN ["x", "z"] AND (t.`type` = "test")
 GROUP BY v;

If your documents are simple (no sub objects or ARRAYs) want to search any field you can try this

CREATE INDEX ix1 ON  test_bucket( ALL OBJECT_VALUES(self)) WHERE type = "test";
SELECT v AS key , COUNT(1) AS cnt
FROM test_bucket AS t
UNNEST OBJECT_VALUES(t) AS v
WHERE t.type = "test" AND v IN ["x","y"]
GROUP BY v;

If you need to count 1 item for document you can use COUNT(DISTINCT META(t).id)

polfer · August 26, 2019, 6:53pm

Works like a champ, and I appreciate the commentary. Thank you for your help!

Topic		Replies	Views
Count objects in array where key value match SQL++	1	1438	November 29, 2018
Grouping on data within an array SQL++	5	2412	July 18, 2019
Querying array with single elements SQL++	14	1037	September 4, 2019
Indexing & Querying Array for Aggregation SQL++ n1ql , index , arrays , array-query	14	1661	January 29, 2020
Index on multiple array expression Couchbase Server index	1	693	January 19, 2022

Counting/Grouping Based on Var in Array Expression (View like Grouping)?

Related topics