Effective array index to match multiple items

I created the following array index:
create index test on retailcloudPortalDev(docType, distinct array v for v in supportedChannelIds end ) where docType=‘experience’;

When I search for items which have a single match in the array, things work fine:
explain select meta().id from retailcloudPortalDev where docType=‘experience’ and any v in supportedChannelIds satisfies v=‘channel:best_buy’ end;
{
"#operator": “DistinctScan”,
“scan”: {
"#operator": “IndexScan”,
“index”: “test”,
“index_id”: “3b8fcf6c82b303d6”,
“keyspace”: “retailcloudPortalDev”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“experience”",
"“channel:best_buy”"
],
“Inclusion”: 3,
“Low”: [
"“experience”",
"“channel:best_buy”"
]
}
}
],
“using”: “gsi”
}
},
However, when I search for items which have 2 matches in the array, it basically fetch anything. Is there a better way to handle this?
cbq> explain select meta().id from retailcloudPortalDev where docType=‘experience’ and any v in supportedChannelIds satisfies v=‘channel:best_buy’ end and any v in supportedChannelIds satisfies v=‘channel:tmo’ end;
{
“requestID”: “efb37736-3bc7-4154-bfa5-af379b8cac39”,
“signature”: “json”,
“results”: [
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “DistinctScan”,
“scan”: {
"#operator": “IndexScan”,
“index”: “test”,
“index_id”: “3b8fcf6c82b303d6”,
“keyspace”: “retailcloudPortalDev”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“null”
],
“Inclusion”: 0
}
}
],
“using”: “gsi”
}
},
{
"#operator": “Fetch”,
“keyspace”: “retailcloudPortalDev”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “((((retailcloudPortalDev.docType) = “experience”) and any v in (retailcloudPortalDev.supportedChannelIds) satisfies (v = “channel:best_buy”) end) and any v in (retailcloudPortalDev.supportedChannelIds) satisfies (v = “channel:tmo”) end)”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “(meta(retailcloudPortalDev).id)”
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
},

FYI

which version of CB do you use?
How about this N1QL’s explain show?

select meta().id from retailcloudPortalDev where docType='experience' and any v in supportedChannelIds satisfies v IN ['channel:best_buy','channel:tmo']  end;

I’m using 4.5.1. However, I notice the explain plan is different when i run
the same query on 4.6.1. Your query is different from mine. Your query
retrieve records which match channel:best_buy or channel:tmo. My query try
to get records which contains both channel:best_buy and channel:tmo.

yes,you are right.
it is interesting query. may be you can only use index once,and then Filter the data ,such as by


SELECT meta(a).id
  FROM (
      select RAW meta().id
        from retailcloudPortalDev 
      where docType='experience' 
        and any v in supportedChannelIds satisfies v='channel:best_buy' end  
  ) AS a JOIN retailcloudPortalDev ON KEYS (a.id)
      where a.docType='experience' 
        and any v in a.supportedChannelIds satisfies v='channel:tmo' end

or index supportedChannelIds to use IN condition is better options?
any suggestions? @geraldss @prasad @vsr1 ?

@atom_yang , @mikehcumail Use 4.6.0 or 4.6.1 https://issues.couchbase.com/browse/MB-21720

for 4.5.1 As @geraldss suggested in Additional help with N1QL dynamic attribute key use Intersect.

Also try this.

SELECT meta().id FROM retailcloudPortalDev WHERE docType='experience' AND ANY v IN supportedChannelIds SATISFIES v='channel:best_buy' END AND 'channel:tmo' IN supportedChannelIds;

If supportedChannelIds array is not too big include in the index so that it will make covered. Also you can remove docType from index keys because in index condition it is equality predicate.

CREATE INDEX test ON retailcloudPortalDev(DISTINCT ARRAY v FOR v IN supportedChannelIds END, supportedChannelIds ) WHERE docType='experience';

2 Likes

If supportedChannelIds is not too big, I think @vsr1 's suggestion is better than use Intersect.

Thanks. Following is the explain plan. Please correct me if I’m wrong. In this case, index scan will filter records based on the 1st condition(channel:best_buy) only. So potentially a lot of records may be fetched unnecessary(2nd condition is not satisfied). However, because it is a covered index, it will not fetch records from data layer. Is this correct? If so, we will experience load on the query layer because the filter operation is done on query layer, right?

cbq> explain SELECT meta().id FROM retailcloud WHERE docType=‘experience’ AND ANY v IN supportedChannelIds SATISFIES v=‘channel:best_buy’ END AND ‘channel:tmo’ IN supportedChannelIds;
{
“requestID”: “aa0bcf88-8e14-42e9-a9ec-7e9b3bddee70”,
“signature”: “json”,
“results”: [
{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “DistinctScan”,
“scan”: {
#operator”: “IndexScan”,
“covers”: [
“cover ((distinct (array v for v in (retailcloud.supportedChannelIds) end)))”,
“cover ((retailcloud.supportedChannelIds))”,
“cover ((meta(retailcloud).id))”
],
“filter_covers”: {
“cover ((retailcloud.docType))”: “experience”
},
“index”: “test”,
“index_id”: “eff81c1ae81481f4”,
“keyspace”: “retailcloud”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“successor("channel:best_buy")”
],
“Inclusion”: 0,
“Low”: [
“"channel:best_buy"”,
“null”
]
}
}
],
“using”: “gsi”
}
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Filter”,
“condition”: “(((cover ((retailcloud.docType)) = "experience") and any v in cover ((retailcloud.supportedChannelIds)) satisfies (v = "channel:best_buy") end) and ("channe l:tmo" in cover ((retailcloud.supportedChannelIds))))”
},
{
#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “cover ((meta(retailcloud).id))”
}
]
},
{
#operator”: “FinalProject”
}
]
}
}
]
},
“text”: “SELECT meta().id FROM retailcloud WHERE docType=‘experience’ AND ANY v IN supportedChannelIds SATISFIES v=‘channel:best_buy’ END AND ‘channel:tmo’ IN supportedChannelIds;”
}
],
“status”: “success”,
“metrics”: {
“elapsedTime”: “9.030539ms”,
“executionTime”: “8.982701ms”,
“resultCount”: 1,
“resultSize”: 3183
}
}

@mikehcumail, Yes you right.
In this case, index scan will filter records based on the 1st condition(channel:best_buy) only and produce the records unnecessary if 2nd condition is not satisfied. due to covered index no data fetch happens and query may eliminate more records due to 2nd condition. Based on selectivity you can switch 1st or 2nd condition.