I am covering an object of which includes a list of tokens in a document.
Example:
{
“eC”: “N”,
“token”: [
“582868”
],
}
the majority of our use cases will only have one token, but we may have a few docs that consist of multiple(2).
in our index we are covering the field as such:
CREATE INDEX IDX_Dt ON BN(((distinct (token)))PARTITION BY hash(token) WITH { “defer_build”:true, “num_replica”:1, “num_partition”:32 }
in our query when I am trying to query over this data, for example grouping on token, the server is performing a fetch. the information does not seem to be in the index node. Is their a way to better define this index, or does the structure need to change for a better performing query? we are iterating over the data to get a result.
this is what the query looks like:
SELECT ARRAY tkn FOR tkn IN t.token WHEN tkn IN [ “cdafac6a” ] END AS acctToken, t COUNT(1) AS cnts
FROM BN AS t
WHERE ANY tkn in t.token SATISFIES tkn IN [“cdafac6a”] END
GROUP BY t.token;
This is not covered . Index has flattened array each as separate item.
As you mentioned you have only 1 or 2 in array
Use the following index, which makes it covered. Array stored as last key in the index as scalar make it cover.
CREATE INDEX `IDX_Dt` ON `BN` (((distinct ( `token` )), `token`)
PARTITION BY hash( `token` ) WITH { “defer_build”:true, “num_replica”:1, “num_partition”:32 }
If you want count the each value how many times present across the documents you can do the following.
CREATE INDEX ix1 ON `BN` ( ALL token);
SELECT tk AS token, COUNT(1) AS cnts
FROM BN AS t
UNNEST t.token AS tk
WHERE tk IN ["cdafac6a", "xyz"]
GROUP BY tk;
Thanks for your response. That does seem to be much better in performance. But by what you wrote
"As you mentioned you have only 1 or 2 in array
Use the following index, which makes it covered. Array stored as last key in the index as scalar make it cover. "
CREATE INDEX `IDX_Dt` ON `BN` (((distinct ( `token` )), `token`)
PARTITION BY hash( `token` ) WITH { “defer_build”:true, “num_replica”:1, “num_partition”:32 }
It seems like even if I have more than 2 elements because i am using the array as the last key even if I were to have a 3rd or 4th element or more in the array it would perform in the same manner and the index would be the same? Am I understanding correctly?
Number of elements doesn’t matter. The reason i mentioned Assume you have 10000 elements in array, storing such big array as part of index can make index ballon. You need to measure performance covers vs non-cover due to index record size