Couchbase 4.1.1 CE - slow query when no limit is specified

Hi,

Given the document structure:
{ "id" : "123", "type" : "data", "value" : "some value", "created" : 1478109980000, owner : "some username"}

And created index:
CREATE INDEXowner_type_created_descONsensorbucket(owner,type,(-created)) USING GSI

When I run following query it runs really slow:
select * from sensorbucket use index (owner_type_created_desc) where owner = 'milan' and type = 'data' and created >= 1481719427523;

Here’s the explain output:

{
"requestID": "d940371a-41a5-491a-85ed-ec99fa9d23e2",
"signature": "json",
"results": [
    {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "IndexScan",
                "index": "owner_type_created_desc",
                "keyspace": "sensorbucket",
                "namespace": "default",
                "spans": [
                    {
                        "Range": {
                            "High": [
                                "\"milan\"",
                                "successor(\"data\")"
                            ],
                            "Inclusion": 1,
                            "Low": [
                                "\"milan\"",
                                "\"data\""
                            ]
                        }
                    }
                ],
                "using": "gsi"
            },
            {
                "#operator": "Parallel",
                "~child": {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "Fetch",
                            "keyspace": "sensorbucket",
                            "namespace": "default"
                        },
                        {
                            "#operator": "Filter",
                            "condition": "((((`sensorbucket`.`owner`) = \"milan\") and ((`sensorbucket`.`type`) = \"data\")) and (1481719427523 \u003c= (`sensorbucket`.`created`)))"
                        },
                        {
                            "#operator": "InitialProject",
                            "result_terms": [
                                {
                                    "expr": "self",
                                    "star": true
                                }
                            ]
                        },
                        {
                            "#operator": "FinalProject"
                        }
                    ]
                }
            }
        ]
    }
],
"status": "success",
"metrics": {
    "elapsedTime": "1.91893ms",
    "executionTime": "1.876985ms",
    "resultCount": 1,
    "resultSize": 2217
}

But when I add limit that is smaller than the number of available results:
select * from sensorbucket use index (owner_type_created_desc) where owner = 'milan' and type = 'data' and created >= 1481719427523 limit 5;

it runs 10 times faster. I guess that in that case the query actually utilizes the index. If I omit ‘limit’ clause, or if limit is larger than the number of results that satisfy the query in that case the query is slow.

Am I missing something?

How many results, and how long does it take?

You can also try a covering index.

DB has 90k of documents. There are ~2500 docs where owner = 'milan'. Only 3 documents satisfy all three conditions owner = 'milan' and type = 'data' and created >= 1481719427523. It takes ~1.6s to fetch them. If I specify limit 2 it takes 131ms. My wild guess is that range query scans all the documents that satisfy equality part of the query. It would be useful if explain contained the number of scanned documents.

I also tried using covering index (I just adjusted the query to fetch only the indexed fields), it did not have effect.

EDIT: I think I managed to reproduce this on 4.5.1 EE version as well. I tried it locally on my machine with 15k documents, so results are fetched much faster (500ms), but adding limit also makes query run 10x faster (~50ms).

EDIT 2: The time needed to fetch the results increases significantly when more documents are added to the database. For 62k documents it takes 2.3 seconds to fetch the results on my workstation.

Can you post the CREATE INDEX statements you are using. You can also try putting created first in the index.

It’s the same index as the one I already posted:

CREATE INDEX owner_type_created_desc ON sensorbucket(owner,type,(-created)) USING GSI

I dropped that index and created the one you suggested:

create index created_desc_owner_type ON sensorbucket((-created),owner,type) using gsi;

And now the query does not use index at all.

cbq> explain select * from sensorbucket where owner = 'milan' and type = 'data' and created >= 1481744471958;
{
    "requestID": "2a74ffc8-3202-457a-a366-adb9c167f2e3",
    "signature": "json",
    "results": [
        {
            "plan": {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "PrimaryScan",
                        "index": "#primary",
                        "keyspace": "sensorbucket",
                        "namespace": "default",
                        "using": "gsi"
                    },
                    {
                        "#operator": "Fetch",
                        "keyspace": "sensorbucket",
                        "namespace": "default"
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "Filter",
                                    "condition": "((((`sensorbucket`.`owner`) = \"milan\") and ((`sensorbucket`.`type`) = \"data\")) and (1481744471958 \u003c= (`sensorbucket`.`created`)))"
                                },
                                {
                                    "#operator": "InitialProject",
                                    "result_terms": [
                                        {
                                            "expr": "self",
                                            "star": true
                                        }
                                    ]
                                },
                                {
                                    "#operator": "FinalProject"
                                }
                            ]
                        }
                    }
                ]
            },
            "text": "select * from sensorbucket where owner = 'milan' and type = 'data' and created \u003e= 1481744471958"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "2.266268ms",
        "executionTime": "2.226883ms",
        "resultCount": 1,
        "resultSize": 1880
    }
}

Try both of these.

CREATE INDEX idx1 ON sensorbucket( owner, created ) WHERE type = 'data';

select *
from sensorbucket use index (idx1)
where owner = 'milan' and type = 'data' and created >= 1481719427523;

CREATE INDEX idx2 ON sensorbucket( -created, owner ) WHERE type = 'data';

select *
from sensorbucket use index (idx2)
where owner = 'milan' and type = 'data' and -created <= -1481719427523;

The first one does not return data in descending order.

The second one appears to work.

Thanks!

If you want the results ordered you must use an ORDER BY. Otherwise, the order is not guaranteed.