I tried creating an index with a key being an array and plan seems much better (from my understanding).
CREATE INDEX `stuff_index_arr` ON `hx2a-index`([(`@c`.`I`),`@S`,`n`],(meta().`id`)) WHERE ((`@T`.`Stuff`) is not missing)
select\/*+INDEX(`hx2a-index` `stuff_index`)*\/`hx2a-index`.`@c`.I as`@0`,`hx2a-index`.`@S` as`@1`,`hx2a-index`.`n` as`@2`,meta(`hx2a-index`).id as`@I`from`hx2a-index`where`hx2a-index`.`@T`.`Stuff` is not missing and `hx2a-index`.`@c`.I is not missing and `hx2a-index`.`@S` is not missing and `hx2a-index`.`n` is not missing and[`hx2a-index`.`@c`.I,`hx2a-index`.`@S`,`hx2a-index`.`n`]>=[$1,$2,$3]and[`hx2a-index`.`@c`.I,`hx2a-index`.`@S`,`hx2a-index`.`n`]<=[$4,$5,$6]order by`@0`asc,`@1`asc,`@2`asc,`@I`asc limit $7
"spans": [
{
"exact": true,
"range": [
{
"high": "[$4, $5, $6]",
"inclusion": 3,
"index_key": "[(`@c`.`I`), `@S`, `n`]",
"low": "[$1, $2, $3]"
}
]
}
A couple of followup questions:
Is there a way to avoid meta.id() key and still get doc id from index rows result ? Not a big deal thought
Not sure we need order by also I guess.
How we deal with case when one or more keys ordering is different? Naively I’d think these array comparison do not work. Tried to use “desc” keyword in CREATE index definition like:
CREATE INDEX `stuff_index_arr` ON `hx2a-index`([(`@c`.`I`),`@S` desc,`n`],(meta().`id`)) WHERE ((`@T`.`Stuff`) is not missing)
but that’s not accepted.
In this case I’d like to treat @S values as sorted in reverse order.
For reference, same query as above with following index gives following plan. Difference is just that 3 first keys are not in array. Not clear to me why this makes any difference.
CREATE INDEX `stuff_index` ON `hx2a-index`((`@c`.`I`),`@S`,`n`,(meta().`id`)) WHERE ((`@T`.`Stuff`) is not missing)
"spans": [
{
"range": [
{
"inclusion": 0,
"index_key": "(`@c`.`I`)",
"low": "null"
},
{
"inclusion": 0,
"index_key": "`@S`",
"low": "null"
},
{
"inclusion": 0,
"index_key": "`n`",
"low": "null"
}
]
}
Use your query on your data. I just used travel-sample because I do not have your data.
Click on the “Index Advisor” button in the Query Tab. It will show the indexes to create. And then click on the Create Indexes button and it will create those indexes!
meta().id is the document id. So, no there’s no way to get the document id without projecting the document id.
I’ve have also this travel-sample bucket created from the Couchbase console, but not the index you used in the plan output. I’ve asked for index definition because it seems different from ones we create, in particular using keys in an array. This seems to be a game changer but please feel free to confirm.
BTW deleted existing indexes and tried your suggestion and I get with query:
select `hx2a-index`.`@c`.I as`@0`,`hx2a-index`.`@S` as`@1`,`hx2a-index`.`n` as`@2`,meta(`hx2a-index`).id as`@I`from`hx2a-index`where`hx2a-index`.`@T`.`Stuff` is not missing and `hx2a-index`.`@c`.I is not missing and `hx2a-index`.`@S` is not missing and `hx2a-index`.`n` is not missing and[`hx2a-index`.`@c`.I,`hx2a-index`.`@S`,`hx2a-index`.`n`]>=[$1,$2,$3]and[`hx2a-index`.`@c`.I,`hx2a-index`.`@S`,`hx2a-index`.`n`]<=[$4,$5,$6]order by`@0`asc,`@1`asc,`@2`asc,`@I`asc limit $7
I get this advise index, which seems to repeat the keys, in an array first and 2nd separately.
CREATE INDEX adv_c_ISn_S_T_Stuff_c_I_n ON `hx2a-index`([`@c`.`I`, `@S`, `n`],`@S`,`@T`.`Stuff`,`@c`.`I`,`n`)
I’m trying to understand your whole use case - it looks like you are using some properties of the hx2a-index collection to construct an index [ @c.l, @S, n ] ? The application shouldn’t need to construct indexes - Couchbase can do that.
Thanks. The “is not missing” is to workaround a “feature” in indexes that seems to include documents even if they do not contain that key path, if that key is not leading one in index definition. Perhaps this is not needed if we use the array of keys?
If you doing index like this and query [hx2a-index .@c .I,hx2a-index .@S ,hx2a-index .n ]>=[$1,$2,$3]
One need to under stand what you are doing. Array comparisons are complex
UPSERT INTO default VALUES("k01", {"a":1, "b": 0});
UPSERT INTO default VALUES("k02", {"a":1, "b": 1});
UPSERT INTO default VALUES("k03", {"a":1, "b": 10});
UPSERT INTO default VALUES("k04", {"a":2, "b": 0});
UPSERT INTO default VALUES("k05", {"a":2, "b": 1});
UPSERT INTO default VALUES("k06", {"a":2, "b": 3});
CREATE INDEX ix1 ON default([a,b]);
SELECT RAW META().id FROM default AS d WHERE [a,b] >= [1,1] AND [a,b] <= [2,1]
Will return “k02”, “k03”, “k04”, “k05” due to nature of array comparison (like string comparison), look k03, k04 a values with in range, so b’s ignored even out of range
As said earlier, @jo1 understands lexicographic ordering when comparing JSON vectors. He wants “k02”, “k03”, “k04”, “k05” in the response. His question is how to specify index creation and the select so that the execution plan does not do a O(N) full index scan but a O(log(N)) one.
That’s why the title of this question is about access to indexes without SQL++, because the idea is to access the lexicographically-sorted index sitting behind SQL++. But surely there should be a way to do that with SQL++. Does
SELECT RAW META().id FROM default AS d WHERE [a,b] >= [1,1] AND [a,b] <= [2,1]
And whatever you want projected you need to project. meta().id is the document id and that’s what I gave because the “doc id” is what was asked for. Not k01, k02 etc. The example with RAW I gave us with a single property - I dont know if raw works with a list of properties.
So - SQL++ will give you exactly what you want, the way you want it, with the complexity you want - and it even shows you what indexes to create.
Now would that work as well without using an array in index creation? Something like:
CREATE INDEX ix1 ON default(a,b);
SELECT RAW META().id FROM default AS d WHERE [a,b] >= [1,1] AND [a,b] <= [2,1]
If it works just as well, it would allow with small changes to support a mix of ascending and descending keys, provided that the select expresses a lexicographic comparison with the proper boolean expression, instead of comparing with JSON arrays.
Query works. But it must do complete index scan. It can take long time. If index has 10Million items query must get 10Million and apply predicate in query nothing pushed to indexer.
Index keys are indiviual terms a,b
Query predicate is set [a,b]
right query
SELECT RAW META().id FROM default AS d WHERE a between 1 and 2 and b between 1 and 1
With the index defined that way, it works just fine in the query tab. Both with select comparing JSON vectors and with the equivalent boolean expression specifying the lexicographic comparison (the latter allows descending bits in the key, mixed with ascending ones). Is it enough to take a look at the execution plan “range” and check that the “low” value corresponds to the min JSON vector to conclude that there is proper index dichotomy and not full index scan?
So.. I’m in the same boat as you. I didn’t write the query compiler - and even if I did, the only way to know if it does what someone says it does is to actually try it. Not because they might be wrong, but simply because the question may not have been clear to them, or that they have made assumptions that are different. So try it and whatever the result is - that’s your answer. Just one caution when trying things in the Query Tab - make sure to use the exact same query. There is temptation to substitute query parameters (i.e. $name etc) with literals ‘Smith’ in the Query Tab. That makes a very different query. Instead, use the actual query, with parameters, and set the parameters in the Settings .
. Is it enough to take a look at the execution plan “range” and check that the “low” value corresponds to the min JSON vector to conclude that there is proper index dichotomy and not full index scan?
If the query plan isn’t convincing, you can look at the “items in”.
It means also that there is no efficient way (meaning without full index scan) to “access” the index “through” SQL++ when there are ascending and descending keys mixed together. From an application standpoint it means that implementing pagination is essentially impossible in this case. That’s a pretty strong limitation.
The objective hasn’t changed, it is in the title: accessing a slice of any index. It’s just that to explain it, we have been through many examples and it took a long time to explain that it’s not a kdtree-like multidimensional search with independent conditions on each key. It was quite clear in the title that it was not that, and that indeed the problem was the one articulated, not another one.
Now an index, as in the title, can have a mix of ascending and descending keys. Your suggestion to obtain efficiently a slice of the index is to create the index on a JSON array of the keys. That is not compatible with a mix of ascending and descending keys. The only way to do that with Couchbase is to not have that array, like explained earlier:
"CREATE INDEX ix1 ON default(a,b);
SELECT RAW META().id FROM default AS d WHERE [a,b] >= [1,1] AND [a,b] <= [2,1]
If it works just as well, it would allow with small changes to support a mix of ascending and descending keys, provided that the select expresses a lexicographic comparison with the proper boolean expression, instead of comparing with JSON arrays."
Then it would be possible to paginate. But apparently that doesn’t work, the execution plan does a full index scan. That’s what we see when we try it in the console.
Your suggestion to obtain efficiently a slice of the index is to create the index on a JSON array of the keys.
My suggestion was to create the indexes suggested by the Index Advisor for the specific query. Since the query had a predicate on a json array, the index advisor suggested an index on the json array. If selecting by that json array is not the only way you want to query the data, then possibly separate indexes on the array elements would give that flexibility.
That is not compatible with a mix of ascending and descending keys.
What do you mean by a mix of ascending and descending keys? Can you show a query?
If it works just as well,
Did you try it? What did was the resulting query plan? Also - if, after executing, you click on Query Plan, it shows the results of executing the query plan (items in, items out, timing etc)