N1ql query with unnest array_concat against large dataset consumes "all" memory

I have a large number of documents containing multiple arrays of objects such as

{
  "name": "42",
  "allies": [
    {
      "id": "1",
      "party": 1
    }
  ],
  "neutral": [
    {
      "id": "2",
      "party": 1
    }
  ],
  "enemy": [
    {
      "id": "unique",
      "party": 100
    }
  ]
}

and want to find any and all documents where a particular party appear across a large dataset

I have tried with the query, which “fails” due to eating all my available memory

CREATE INDEX adv_ALL_array_concat_d_allies_d_neutral_d_enemy_party ON `default`(ALL ARRAY `data`.`party` FOR data IN array_concat((`allies`), (`neutral`), (`enemy`)) END)

    SELECT data.party
    FROM `default` AS d
    UNNEST ARRAY_CONCAT(d.allies, d.neutral, d.enemy) AS data
    WHERE data.party IS VALUED
        AND META(d).id LIKE 'a:%'
    GROUP BY data.party
    HAVING COUNT(1) > 1

to try and fix the memory issue I tried using a join to “stream” results but I still have the issue of memory being consumed

SELECT biggerThan.party,
       ARRAY_AGG(META(d2).id) AS appear_in
FROM (
    SELECT data.party
    FROM `default` AS d
    UNNEST ARRAY_CONCAT(d.allies, d.neutral, d.enemy) AS data
    WHERE data.party IS VALUED
        AND META(d).id LIKE 'a:%'
    GROUP BY data.party
    HAVING COUNT(1) > 1 ) AS biggerThan
    JOIN `default` AS d2 ON biggerThan.party IN ARRAY_CONCAT(d2.allies, d2.neutral, d2.enemy).party
    AND biggerThan.party IS VALUED
    AND META(d2).id LIKE 'a:%'
GROUP BY biggerThan.party

What would the recommended way of executing some query akin to this against a large database be? Im guessing my problem is around the unnest and array_concat?

Make sure it uses covering index. EE it uses index aggregation.

CREATE INDEX ixt1 ON default(ALL ARRAY_CONCAT(allies[*].party, enemy[*].party, neutral[*].party))                                        
WHERE meta().id LIKE 'a:%';

SELECT party
FROM default AS d
UNNEST  ARRAY_CONCAT(d.allies[*].party, d.enemy[*].party, d.neutral[*].party) AS party
WHERE META(d).id LIKE "a:%" AND party IS VALUED
GROUP BY party
HAVING COUNT(1) > 1;

cool

the index doesnt pick up documents where one or more of the arrays are empty through, for example this document wouldnt be picked up

{
  "name": "42",
  "allies": [
    {
      "id": "1",
      "party": 1
    },
    {
      "id": "2",
      "party": 1
    }
  ],
  "neutral": [],
  "enemy": [
    {
      "id": "3",
      "party": 3
    }
  ]
}

This index also seem to “miss” documents if some jsonarray is empty

replace in index and query

ARRAY_CONCAT(allies[*].party, enemy[*].party, neutral[*].party)
==>>
ARRAY v.party FOR v IN ARRAY_CONCAT(IFMISSINGORNULL(allies,[]), IFMISSINGORNULL(enemy,[]), IFMISSINGORNULL(neutral,[])) END

Alternatively:

CREATE INDEX ixt ON default(
  ARRAY_CONCAT(IFMISSING(allies[*].party,[]), IFMISSING(enemy[*].party,[]), IFMISSING(neutral[*].party,[])))
WHERE meta().id LIKE 'a:%';

should cover with:

SELECT party                                                                                                                        
FROM default AS d
UNNEST ARRAY_CONCAT(IFMISSING(allies[*].party,[]), IFMISSING(enemy[*].party,[]), IFMISSING(neutral[*].party,[])) AS party
WHERE META(d).id LIKE "a:%" AND party IS VALUED
GROUP BY party
HAVING COUNT(1) > 1;

thanks, that index picks up everything but the query still has the issue of eating memory

It isn’t possible to stream results early because of the GROUP BY - all results have to be seen so that only distinct values are returned. For a large number of distinct “party” values this would have some overhead; I don’t see a way to change the statement to reduce the overhead and produce the same result. (All variants require some data gathering to successfully count parties across all array elements across all documents.)

When you state “eating memory”, what is the usage you observe (and where/what metric/tool are you using) and the result count from the statement?

Also, how many UNNESTed items are there, i.e.

SELECT count(1)
FROM default AS d
UNNEST ARRAY_CONCAT(IFMISSING(allies[*].party,[]), IFMISSING(enemy[*].party,[]), IFMISSING(neutral[*].party,[])) AS party
WHERE META(d).id LIKE "a:%" AND party IS VALUED
;

?

Are you using Enterprise Edition or Community Edition? - And the version?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.