Access to index without SQL++ query

jo1 · May 13, 2025, 11:07am

I tried creating an index with a key being an array and plan seems much better (from my understanding).

CREATE INDEX `stuff_index_arr` ON `hx2a-index`([(`@c`.`I`),`@S`,`n`],(meta().`id`)) WHERE ((`@T`.`Stuff`) is not missing)

select\/*+INDEX(`hx2a-index` `stuff_index`)*\/`hx2a-index`.`@c`.I as`@0`,`hx2a-index`.`@S` as`@1`,`hx2a-index`.`n` as`@2`,meta(`hx2a-index`).id as`@I`from`hx2a-index`where`hx2a-index`.`@T`.`Stuff` is not missing and `hx2a-index`.`@c`.I is not missing and `hx2a-index`.`@S` is not missing and `hx2a-index`.`n` is not missing and[`hx2a-index`.`@c`.I,`hx2a-index`.`@S`,`hx2a-index`.`n`]>=[$1,$2,$3]and[`hx2a-index`.`@c`.I,`hx2a-index`.`@S`,`hx2a-index`.`n`]<=[$4,$5,$6]order by`@0`asc,`@1`asc,`@2`asc,`@I`asc limit $7

                       "spans": [
                            {
                                "exact": true,
                                "range": [
                                    {
                                        "high": "[$4, $5, $6]",
                                        "inclusion": 3,
                                        "index_key": "[(`@c`.`I`), `@S`, `n`]",
                                        "low": "[$1, $2, $3]"
                                    }
                                ]
                            }

A couple of followup questions:

Is there a way to avoid meta.id() key and still get doc id from index rows result ? Not a big deal thought
Not sure we need order by also I guess.
How we deal with case when one or more keys ordering is different? Naively I’d think these array comparison do not work. Tried to use “desc” keyword in CREATE index definition like:

CREATE INDEX `stuff_index_arr` ON `hx2a-index`([(`@c`.`I`),`@S` desc,`n`],(meta().`id`)) WHERE ((`@T`.`Stuff`) is not missing)

but that’s not accepted.
In this case I’d like to treat @S values as sorted in reverse order.

Thanks

jo1 · May 13, 2025, 12:03pm

For reference, same query as above with following index gives following plan. Difference is just that 3 first keys are not in array. Not clear to me why this makes any difference.

CREATE INDEX `stuff_index` ON `hx2a-index`((`@c`.`I`),`@S`,`n`,(meta().`id`)) WHERE ((`@T`.`Stuff`) is not missing)

"spans": [
                            {
                                "range": [
                                    {
                                        "inclusion": 0,
                                        "index_key": "(`@c`.`I`)",
                                        "low": "null"
                                    },
                                    {
                                        "inclusion": 0,
                                        "index_key": "`@S`",
                                        "low": "null"
                                    },
                                    {
                                        "inclusion": 0,
                                        "index_key": "`n`",
                                        "low": "null"
                                    }
                                ]
                            }

mreiche · May 13, 2025, 1:47pm

Use your query on your data. I just used travel-sample because I do not have your data.

Click on the “Index Advisor” button in the Query Tab. It will show the indexes to create. And then click on the Create Indexes button and it will create those indexes!

meta().id is the document id. So, no there’s no way to get the document id without projecting the document id.

jo1 · May 13, 2025, 3:19pm

I’ve have also this travel-sample bucket created from the Couchbase console, but not the index you used in the plan output. I’ve asked for index definition because it seems different from ones we create, in particular using keys in an array. This seems to be a game changer but please feel free to confirm.

Thanks

jo1 · May 13, 2025, 3:40pm

BTW deleted existing indexes and tried your suggestion and I get with query:

select `hx2a-index`.`@c`.I as`@0`,`hx2a-index`.`@S` as`@1`,`hx2a-index`.`n` as`@2`,meta(`hx2a-index`).id as`@I`from`hx2a-index`where`hx2a-index`.`@T`.`Stuff` is not missing and `hx2a-index`.`@c`.I is not missing and `hx2a-index`.`@S` is not missing and `hx2a-index`.`n` is not missing and[`hx2a-index`.`@c`.I,`hx2a-index`.`@S`,`hx2a-index`.`n`]>=[$1,$2,$3]and[`hx2a-index`.`@c`.I,`hx2a-index`.`@S`,`hx2a-index`.`n`]<=[$4,$5,$6]order by`@0`asc,`@1`asc,`@2`asc,`@I`asc limit $7

I get this advise index, which seems to repeat the keys, in an array first and 2nd separately.

CREATE INDEX adv_c_ISn_S_T_Stuff_c_I_n ON `hx2a-index`([`@c`.`I`, `@S`, `n`],`@S`,`@T`.`Stuff`,`@c`.`I`,`n`)

mreiche · May 13, 2025, 4:28pm

Right. They are combined for the inequalities

[hx2a-index.@c.I,hx2a-index.@S,hx2a-index.n]>=[$1,$2,$3]

and separate for the is-not-missing.

hx2a-index.n is not missing

I’m trying to understand your whole use case - it looks like you are using some properties of the hx2a-index collection to construct an index [ @c.l, @S, n ] ? The application shouldn’t need to construct indexes - Couchbase can do that.

jo1 · May 13, 2025, 5:01pm

Thanks. The “is not missing” is to workaround a “feature” in indexes that seems to include documents even if they do not contain that key path, if that key is not leading one in index definition. Perhaps this is not needed if we use the array of keys?

vsr1 · May 13, 2025, 8:37pm

If you doing index like this and query [hx2a-index .@c .I,hx2a-index .@S ,hx2a-index .n ]>=[$1,$2,$3]

One need to under stand what you are doing. Array comparisons are complex

UPSERT INTO default VALUES("k01", {"a":1, "b": 0});
UPSERT INTO default VALUES("k02", {"a":1, "b": 1});
UPSERT INTO default VALUES("k03", {"a":1, "b": 10});
UPSERT INTO default VALUES("k04", {"a":2, "b": 0});
UPSERT INTO default VALUES("k05", {"a":2, "b": 1});
UPSERT INTO default VALUES("k06", {"a":2, "b": 3});

CREATE INDEX ix1 ON default([a,b]);
SELECT RAW META().id FROM default AS d WHERE [a,b] >= [1,1] AND [a,b] <= [2,1]

Will return “k02”, “k03”, “k04”, “k05” due to nature of array comparison (like string comparison), look k03, k04 a values with in range, so b’s ignored even out of range

Hawkeye · May 14, 2025, 5:06am

As said earlier, @jo1 understands lexicographic ordering when comparing JSON vectors. He wants “k02”, “k03”, “k04”, “k05” in the response. His question is how to specify index creation and the select so that the execution plan does not do a O(N) full index scan but a O(log(N)) one.

That’s why the title of this question is about access to indexes without SQL++, because the idea is to access the lexicographically-sorted index sitting behind SQL++. But surely there should be a way to do that with SQL++. Does

SELECT RAW META().id FROM default AS d WHERE [a,b] >= [1,1] AND [a,b] <= [2,1]

Do that?

mreiche · May 14, 2025, 6:23am

Yes. See -

And whatever you want projected you need to project. meta().id is the document id and that’s what I gave because the “doc id” is what was asked for. Not k01, k02 etc. The example with RAW I gave us with a single property - I dont know if raw works with a list of properties.

So - SQL++ will give you exactly what you want, the way you want it, with the complexity you want - and it even shows you what indexes to create.

Hawkeye · May 14, 2025, 10:50am

Now would that work as well without using an array in index creation? Something like:

CREATE INDEX ix1 ON default(a,b);
SELECT RAW META().id FROM default AS d WHERE [a,b] >= [1,1] AND [a,b] <= [2,1]

If it works just as well, it would allow with small changes to support a mix of ascending and descending keys, provided that the select expresses a lexicographic comparison with the proper boolean expression, instead of comparing with JSON arrays.

vsr1 · May 14, 2025, 11:35am

Query works. But it must do complete index scan. It can take long time. If index has 10Million items query must get 10Million and apply predicate in query nothing pushed to indexer.
Index keys are indiviual terms a,b
Query predicate is set [a,b]

right query

SELECT RAW META().id FROM default AS d WHERE a between 1 and 2 and b between 1 and 1

mreiche · May 14, 2025, 1:36pm

Now would that work…

Try it in the query tab.

Hawkeye · May 14, 2025, 1:50pm

With the index defined that way, it works just fine in the query tab. Both with select comparing JSON vectors and with the equivalent boolean expression specifying the lexicographic comparison (the latter allows descending bits in the key, mixed with ascending ones). Is it enough to take a look at the execution plan “range” and check that the “low” value corresponds to the min JSON vector to conclude that there is proper index dichotomy and not full index scan?

Hawkeye · May 14, 2025, 1:56pm

No that select does not do at all the same thing. It does not return all the slice of the index.

mreiche · May 14, 2025, 3:04pm

So.. I’m in the same boat as you. I didn’t write the query compiler - and even if I did, the only way to know if it does what someone says it does is to actually try it. Not because they might be wrong, but simply because the question may not have been clear to them, or that they have made assumptions that are different. So try it and whatever the result is - that’s your answer. Just one caution when trying things in the Query Tab - make sure to use the exact same query. There is temptation to substitute query parameters (i.e. $name etc) with literals ‘Smith’ in the Query Tab. That makes a very different query. Instead, use the actual query, with parameters, and set the parameters in the Settings .

. Is it enough to take a look at the execution plan “range” and check that the “low” value corresponds to the min JSON vector to conclude that there is proper index dichotomy and not full index scan?

If the query plan isn’t convincing, you can look at the “items in”.

Hawkeye · May 14, 2025, 3:11pm

Thanks! Makes sense.

It means also that there is no efficient way (meaning without full index scan) to “access” the index “through” SQL++ when there are ascending and descending keys mixed together. From an application standpoint it means that implementing pagination is essentially impossible in this case. That’s a pretty strong limitation.

mreiche · May 14, 2025, 3:31pm

No.

Write your query in the query tab.
click on Index Advisor
create the suggested indexes
run the query
in the event the the query is not efficient as you think it should be, post the query and the indexes and the plan and someone can look at it

btw - a keyset pagination query will have a lower bound, be sorted, and a LIMIT instead of an upper bound.

Hawkeye · May 14, 2025, 3:44pm

The objective hasn’t changed, it is in the title: accessing a slice of any index. It’s just that to explain it, we have been through many examples and it took a long time to explain that it’s not a kdtree-like multidimensional search with independent conditions on each key. It was quite clear in the title that it was not that, and that indeed the problem was the one articulated, not another one.

Now an index, as in the title, can have a mix of ascending and descending keys. Your suggestion to obtain efficiently a slice of the index is to create the index on a JSON array of the keys. That is not compatible with a mix of ascending and descending keys. The only way to do that with Couchbase is to not have that array, like explained earlier:

"CREATE INDEX ix1 ON default(a,b);
SELECT RAW META().id FROM default AS d WHERE [a,b] >= [1,1] AND [a,b] <= [2,1]

If it works just as well, it would allow with small changes to support a mix of ascending and descending keys, provided that the select expresses a lexicographic comparison with the proper boolean expression, instead of comparing with JSON arrays."

Then it would be possible to paginate. But apparently that doesn’t work, the execution plan does a full index scan. That’s what we see when we try it in the console.

mreiche · May 14, 2025, 4:11pm

Your suggestion to obtain efficiently a slice of the index is to create the index on a JSON array of the keys.

My suggestion was to create the indexes suggested by the Index Advisor for the specific query. Since the query had a predicate on a json array, the index advisor suggested an index on the json array. If selecting by that json array is not the only way you want to query the data, then possibly separate indexes on the array elements would give that flexibility.

That is not compatible with a mix of ascending and descending keys.

What do you mean by a mix of ascending and descending keys? Can you show a query?

If it works just as well,

Did you try it? What did was the resulting query plan? Also - if, after executing, you click on Query Plan, it shows the results of executing the query plan (items in, items out, timing etc)

Topic		Replies	Views
Create Secondary Index on Array SQL++ index	14	2925	October 27, 2017
Help for optimizing SQL++	46	2064	April 30, 2019
Couchbase 4.5 Order by diffrence in performnace for ASC and DESC even query covered Couchbase Server n1ql	61	6838	March 29, 2019
N1QL query slow Couchbase Server n1ql , index	32	4818	November 7, 2016
Long query exec time despite using index SQL++	7	1505	July 5, 2018

Access to index without SQL++ query

Related topics