Composite index from an array of objects

Hello,
I have a bucket with documents like this:

{
	"fd": [{
			"ac": "AF",
			"d": 30,
			"fn": 1096,
			"ym": 1911
		}, {
			"ac": "AF",
			"d": 9,
			"fn": 1053,
			"ym": 2009
		}, {
			"ac": "BA",
			"d": 29,
			"fn": 1002,
			"ym": 2006
		}, {
			"ac": "QR",
			"d": 6,
			"fn": 1764,
			"ym": 2011
		}
	],
	"ts": 1572258925.187879,
	"yld": [
		"PEKAMS",
		"CDGORY"
	]
}

I am trying to index, as a composite index the following fields: fd.ac, fd.ym, fd.d, i.e. I need to be able to perform query only targeting the first 1 or 2 fields.

I’ve create an index like this:

CREATE INDEX idx1 ON bucket (DISTINCT ARRAY [f.ac, f.ym, f.d] FOR f IN fd END)

But Couchbase is not able to use it as a composite index, only as a “normal” index, e.g. each element in the array pushes only 1 key in the index.

For instance this query will not work:

SELECT RAW meta().id FROM bucket WHERE ANY f IN fd SATISFIES (f.ac == ‘AF’) END

Couchbase will tell me that no index is suitable for this query (even if I explicitly add a USE INDEX).

Only queries like this, specifying all the fields work:

SELECT RAW meta().id FROM bucket WHERE ANY f IN fd SATISFIES [f.ac, f.ym, f.d] == [“AF”, 1911, 30] END

Going through the documentation I’m not able to find a case like that. Is there a way to do this in Couchbase? Currently I’m using version 5.5.

Thanks a lot for your help!

DISTINCT ARRAY [f.ac, f.ym, f.d] FOR f IN fd END

You are combining 3 fields into array and indexing as scalar array consist of these fields.
To qualify array index your predicate SATISFIES clause must have reference [f.ac, f.ym, f.d]

f.ac = “AF”

SELECT RAW meta().id 
FROM bucket 
WHERE ANY f IN fd SATISFIES [f.ac, f.ym, f.d] >= ["AF"] 
                        AND [f.ac, f.ym, f.d] < [SUCCESSOR("AF")] END;

f.ac = “AF” AND f.ym = 1911

SELECT RAW meta().id 
FROM bucket 
WHERE ANY f IN fd SATISFIES [f.ac, f.ym, f.d] >= ["AF",1911] 
                        AND [f.ac, f.ym, f.d] < ["AF", SUCCESSOR(1911)] END;

f.ac BETWEEN “AF” AND “BC” AND f.ym > 1900 AND f.ym < = 2000

SELECT RAW meta().id 
FROM bucket 
WHERE ANY f IN fd SATISFIES [f.ac, f.ym, f.d] >= ["AF",1900] 
                        AND [f.ac, f.ym, f.d] < ["BC", SUCCESSOR(2000)] 
              AND  f.ac BETWEEN "AF" AND "BC"  AND f.ym  > 1900 AND f.ym < = 2000 END;

One of the predicate ([f.ac, f.ym, f.d]) used to index selection and push the values index spans these must be superset (otherwise false negatives possible, i.e. indexer eliminate qualified values, result in wrong results). Other predicates (f.ac, f.ym) to eliminate false positives (i.e. eliminate extra values don’t give non qualified values)

These are complex and extra caution needed. Checkout

1 Like

Thanks for the explanation and thanks for the this clever way to use the index. However is there a cleaner/easier way to make the index more like a composite index ?

Yes. It is in the road map. cc @keshav_m, @binh.le, @Kamini_Jagtiani

This is a good workaround and should be a part of the main documentation. Also, please consider adding support for the composite index and optimized query in the new 7 release (maybe in later minor versions).