Couchbase 4.5 indexscancounter missing for array indexing

krishnasahu2016 · May 7, 2016, 3:09pm

Created index on array and count is slow.

my index definition

create index index_name_array on default(DISTINCT ARRAY name FOR name IN names END,names)

Query

explain Select count(1) from default where ANY name IN names SATISFIES name like ‘test%’ END

[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “DistinctScan”,
“scan”: {
"#operator": “IndexScan”,
“covers”: [
“cover ((distinct (array name for name in (default.names) end)))”,
“cover ((default.names))”,
“cover ((meta(default).id))”
],
“index”: “index_name_array”,
“index_id”: “70e3dae0425987d”,
“keyspace”: “default”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“tesu”"
],
“Inclusion”: 1,
“Low”: [
"“test”"
]
}
}
],
“using”: “gsi”
}
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “any name in cover ((default.names)) satisfies (name like “test%”) end”
},
{
"#operator": “InitialGroup”,
“aggregates”: [
“count(1)”
],
“group_keys”: []
}
]
}
},
{
"#operator": “IntermediateGroup”,
“aggregates”: [
“count(1)”
],
“group_keys”: []
},
{
"#operator": “FinalGroup”,
“aggregates”: [
“count(1)”
],
“group_keys”: []
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “count(1)”
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
},
“text”: “Select count(1) from default where ANY name IN names SATISFIES name like ‘test%’ END”
}
]

No IndexScanCounter and query is slow in getting count

ingenthr · May 7, 2016, 5:58pm

I moved this to the proper category, just FYI.

geraldss · May 8, 2016, 6:31pm

Hi @krishnasahu2016,

This is an interesting observation. You are correct that we cannot push the COUNT down to the index in this case.

The following two indexes should work:

CREATE INDEX idx_any ON default( ANY name IN names SATISFIES name like 'test%' END, names );

CREATE INDEX idx_any ON default( ANY name IN names SATISFIES name like 'test%' END, names ) WHERE ANY name IN names SATISFIES name like 'test%' END;

krishnasahu2016 · May 9, 2016, 5:14pm

This adds limitation on using array indexing as count is most used for pagination and limitaion.

Cannot create index as suggested as filter value test% is not fixed.

geraldss · May 9, 2016, 5:27pm

We will work on this. Pagination is very important for us, because we are focused on interactive web apps.

The challenge with array indexing is that the result of the scan must be de-duped, so if we push the COUNT down to the index, we must also push down the de-dup. Anyway, this is our problem, not yours We will add this to the perf roadmap, thank you.

geraldss · May 10, 2016, 1:49pm

@krishnasahu2016,

One more note. With N1QL, you don’t have to an explicit COUNT in order to implement pagination. When you do ORDER BY / LIMIT / OFFSET, N1QL returns a sortCount value in the metrics, which is the total number of items that were sorted. Same as your explicit COUNT. So you save that entire operation.

krishnasahu2016 · May 10, 2016, 5:09pm

Can you please share syntax of order by for array

as when i use this its very slow

Select META().id from default where ANY name IN names SATISFIES name like ‘test%’ END order by names LIMIT 30 OFFSET 0

geraldss · May 10, 2016, 5:17pm

I don’t think you want ORDER BY names. If your app is displaying pages, what do you want the pages sorted on? It is typically a timestamp, an alphabetical name, or a numeric metric. Not an array.

krishnasahu2016 · May 10, 2016, 5:19pm

interested in sorting by numeric

krishnasahu2016 · May 10, 2016, 5:26pm

facing issue if i add array to index order by slow.

e.g.

create index index_name_array on default(DISTINCT ARRAY name FOR name IN names END,age,names)

Select META().id from default where ANY name IN names SATISFIES name like ‘test%’ END order by age LIMIT 30 OFFSET 0

slow order by performance

vsr1 · May 10, 2016, 6:08pm

Due to index has array keys and order by keys are not leading keys of index, query can’t take advantage of index order and requires sort.

At present the following plan is optimal.

explain Select  META().id from default where ANY name IN names SATISFIES name like 'test%' END  order by age LIMIT 30 OFFSET 0;
{
    "requestID": "bfe8477b-e47b-44b0-a472-3a86cff4b8f3",
    "signature": "json",
    "results": [
        {
            "plan": {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "Sequence",
                        "~children": [
                            {
                                "#operator": "DistinctScan",
                                "scan": {
                                    "#operator": "IndexScan",
                                    "covers": [
                                        "cover ((distinct (array `name` for `name` in (`default`.`names`) end)))",
                                        "cover ((`default`.`age`))",
                                        "cover ((`default`.`names`))",
                                        "cover ((meta(`default`).`id`))"
                                    ],
                                    "index": "index_name_array",
                                    "index_id": "4b347402735c4cfc",
                                    "keyspace": "default",
                                    "namespace": "default",
                                    "spans": [
                                        {
                                            "Range": {
                                                "High": [
                                                    "\"tesu\""
                                                ],
                                                "Inclusion": 1,
                                                "Low": [
                                                    "\"test\""
                                                ]
                                            }
                                        }
                                    ],
                                    "using": "gsi"
                                }
                            },
                            {
                                "#operator": "Parallel",
                                "~child": {
                                    "#operator": "Sequence",
                                    "~children": [
                                        {
                                            "#operator": "Filter",
                                            "condition": "any `name` in cover ((`default`.`names`)) satisfies (`name` like \"test%\") end"
                                        },
                                        {
                                            "#operator": "InitialProject",
                                            "result_terms": [
                                                {
                                                    "expr": "cover ((meta(`default`).`id`))"
                                                }
                                            ]
                                        }
                                    ]
                                }
                            }
                        ]
                    },
                    {
                        "#operator": "Order",
                        "limit": "30",
                        "offset": "0",
                        "sort_terms": [
                            {
                                "expr": "cover ((`default`.`age`))"
                            }
                        ]
                    },
                    {
                        "#operator": "Offset",
                        "expr": "0"
                    },
                    {
                        "#operator": "Limit",
                        "expr": "30"
                    },
                    {
                        "#operator": "FinalProject"
                    }
                ]
            },
            "text": "Select  META().id from default where ANY name IN names SATISFIES name like 'test%' END  order by age LIMIT 30 OFFSET 0"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "8.865484ms",
        "executionTime": "8.835204ms",
        "resultCount": 1,
        "resultSize": 3904
    }
}

vsr1 · May 10, 2016, 6:53pm

As Gerald suggested earlier by using partial index with modified query can perform better.

create index index_name_array on default(age) where ANY name IN names SATISFIES name like ‘test%’ END;

Select META().id from default where age is not missing and ANY name IN names SATISFIES name like ‘test%’ END order by age LIMIT 30 OFFSET 0;

The following is more generic one (name like “test%” equivalent to name >= “test” and name < “tesu”). Different subset of like clauses can use same index.

create index index_name_array on default(age,names) where ANY name IN names SATISFIES name >= “test” and name < “tesu” END;
explain Select META().id from default where age is not null and ANY name IN names SATISFIES name like ‘test%’ END order by age LIMIT 30 OFFSET 0;
explain Select META().id from default where age is not null and ANY name IN names SATISFIES name like ‘testa%’ END order by age LIMIT 30 OFFSET 0;

krishnasahu2016 · May 11, 2016, 3:43am

Sorry cannot use suggestions to create index like (name like “test%” equivalent to name >= “test” and name < “tesu”) as i will end up with infinite number of indexes

This needs fix as order by is very very slow even for 100K records if i use array indexing

krishnasahu2016 · May 14, 2016, 3:30pm

HI @geraldss

Can you please look into the issue as my thinking is this is critical for using array indexing