Writing Queries with multiple array attributes (Nested array indexes used)

Hi

We have the below query and the attributes are arrays,
entity.locationDSL.parameters.clusters - clusters is an array
entity.condition.productGtins - productGtins is an array

Query:

SELECT META(promotions).id AS _ID, META(promotions).cas AS _CAS, promotions.* FROM promotions
USE INDEX (tmp_nested_index_entity_state_cluster_gtin_class_PR_CL using GSI)
WHERE (_class = ‘XXXX’ OR _class = ‘YYYY’) AND entity.state IN [‘APPROVED’ , ‘ADVERTISED’]
AND entity.startDateTime >= 1539187800000
AND ANY clusterId IN entity.locationDSL.parameters.clusters SATISFIES
(ANY gtin IN entity.condition.productGtins SATISFIES [clusterId,gtin] = [“f9a5d04f-5980-4a0a-9ee4-5af726962f9e”,“31738147500004”] END) END

We first created 2 separate composite array indexes on above attributes which found to be less performant compared to nested array index created as below:

Index1:
CREATE INDEX nested_index_entity_state_cluster_gtin_class_PR_CL
ON promotions(DISTINCT ARRAY (DISTINCT ARRAY [clster,gtin] FOR gtin IN entity.condition.productGtins END) FOR clster IN entity.locationDSL.parameters.clusters END, entity.startDateTime)
PARTITION BY HASH(entity.state)
WHERE ( _class = ‘XXXX’ OR _class = ‘YYYY’) AND entity.state IN [‘APPROVED’ , ‘ADVERTISED’]
using GSI with {‘num_replica’:1,‘num_partition’:6};

Now we have a change in data model and we have an addition attribute coming up along with above two attributes which is again an array
entity.condition.subConditions.productGtins - here subConditions and productGtins both are arrays.

To support the change in data model have created one more index as below,

CREATE INDEX tmp_nested_index_entity_state_cluster_subcondition_gtin_class_PR_CL
ON promotions(DISTINCT ARRAY (DISTINCT ARRAY (DISTINCT ARRAY [clster,gtin] FOR gtin IN subCond.productGtins END) FOR subCond IN entity.condition.subConditions END) FOR clster IN entity.locationDSL.parameters.clusters END,
entity.startDateTime)
PARTITION BY HASH(entity.state)
WHERE ( _class = ‘XXXX’ OR _class = ‘YYYY’) AND entity.state IN [‘APPROVED’ , ‘ADVERTISED’]
using GSI with {‘num_replica’:1,‘num_partition’:6};

However, the below query is not taking up the above index,

New Query
SELECT META(promotions).id AS _ID, META(promotions).cas AS _CAS, promotions.* FROM promotions
USE INDEX (tmp_nested_index_entity_state_cluster_gtin_class_PR_CL,
tmp_nested_index_entity_state_cluster_subcondition_gtin_class_PR_CL USING GSI)
WHERE (_class = ‘XXXX’ OR _class = 'YYYY) AND entity.state IN [‘APPROVED’ , ‘ADVERTISED’]
AND entity.startDateTime >= 1539187800000
AND (ANY clusterId IN entity.locationDSL.parameters.clusters SATISFIES
(ANY gtin IN entity.condition.productGtins SATISFIES [clusterId,gtin] = [“f9a5d04f-5980-4a0a-9ee4-5af726962f9e”,“31738147500004”] END)
OR (ANY subCond IN entity.condition.subConditions SATISFIES (ANY gtin IN subCond.productGtins SATISFIES [clusterId,gtin] = [“f9a5d04f-5980-4a0a-9ee4-5af726962f9e”,“31738147500004”] END) END) END)

Need help in doing things correctly in indexes and query??

Regards,
Venkat

CREATE INDEX ix1 ON promotions(DISTINCT ARRAY (DISTINCT ARRAY [clster, gtin] FOR gtin IN ARRAY_CONCAT(entity.condition.productGtins, entity.condition.subConditions[*].productGtins) END) FOR clster IN entity.locationDSL.parameters.clusters END,
entity.startDateTime)
PARTITION BY HASH(entity.state)
WHERE _class IN ["XXXX","YYYY"] AND entity.state IN ["APPROVED" , "ADVERTISED"]
USING GSI with {"num_replica":1,"num_partition":6};

SELECT META(p).id AS _ID, META(p).cas AS _CAS, p.*
FROM promotions AS p
USE INDEX (ix1)
WHERE p._class IN ["XXXX","YYYY"]
      AND p.entity.state IN ["APPROVED" , "ADVERTISED"]
      AND p.entity.startDateTime >= 1539187800000
      AND (ANY clusterId IN p.entity.locationDSL.parameters.clusters
           SATISFIES (ANY gtin IN ARRAY_CONCAT(p.entity.condition.productGtins, p.entity.condition.subConditions[*].productGtins)
                      SATISFIES [clusterId, gtin] = ["f9a5d04f-5980-4a0a-9ee4-5af726962f9e","31738147500004"] END) END);

You are creating Cartesian product of multiple arrays. This can result in huge index.
Explore Full Text Search https://blog.couchbase.com/n1ql-and-search-how-to-leverage-fts-index-in-n1ql-query/
https://blog.couchbase.com/tag/full-text-search/

Hi
Thanks for the response, I have followed your suggestions but unfortunately still primary index is picked though we have given hints to use index. Can you have a look??

Index:

Query:

Regards,
Venkat

It should work. what version.

"as": "p",
                    "index": "ix1",
                    "index_id": "efce3c28f5bbafd2",
                    "index_projection": {
                        "primary_key": true
                    },
                    "keyspace": "default",
                    "namespace": "default",
                    "spans": [
                        {
                            "exact": true,
                            "range": [
                                {
                                    "high": "[\"f9a5d04f-5980-4a0a-9ee4-5af726962f9e\", \"31738147500004\"]",
                                    "inclusion": 3,
                                    "low": "[\"f9a5d04f-5980-4a0a-9ee4-5af726962f9e\", \"31738147500004\"]"
                                },
                                {
                                    "inclusion": 1,
                                    "low": "1539187800000"
                                }
                            ]
                        }
                    ],

@vsr1

Version is 6.0.2

Regards
Venkat

Try latest 6.0.4 or 6.5.1

Hi
It’s working with version 6.5. 0.

So the older versions doesn’t support it??

Regards,
Venkat

It supports. It works on latest 6.0.x (as you are in 6.0.2 (may be some bug), try latest version of 6.0.4 and if works upgrade to it).
FYI: Also you using PARTITION BY HASH(entity.state), entity.state IN [“APPROVED” , “ADVERTISED”], “num_partition”:6 only 2 partitions might be used.

Sure , many thanks.

Regards,
Venkat

Hi

Can you suggest how the query gets transformed if the parameters in the query are also array??

like if clusterId and gtin turns out to be array??

Regards,
Venkat

SATISFIES [clusterId, gtin] = [$cid, $gid ] END

SATISFIES [clusterId, gtin] IN $cidgidpairs END

$cidgidpairs = [[“f9a5d04f-5980-4a0a-9ee4-5af726962f9e”,“31738147500004”], [“f9a5d04f-5980-4a0a-9ee4-5af726962f9f”,“31738147500005”]]

SATISFIES [clusterId, gtin] IN [$pair1, $pair2] END

$pair1 = [“f9a5d04f-5980-4a0a-9ee4-5af726962f9e”,“31738147500004”]
$pair2 = [“f9a5d04f-5980-4a0a-9ee4-5af726962f9f”,“31738147500005”]

If need help post the parameters.

@vsr1 thanks

Regards
Venkat