Simple count(*) query takes 2+ seconds

moon0326 · April 17, 2016, 9:36am

Hi,

I just noticed that a simple count query takes over 2+ seconds.

select count(*) from bucket where aField=fieldValue

I have index on aField and # of total items is 17000.

Why does it take so long?

househippo · April 17, 2016, 6:15pm

Whats your EXPLAIN saying

EXPLAIN select count(*) from bucket where aField=fieldValue;

Could you output it.

moon0326 · April 17, 2016, 7:07pm

Hello @househippo

Explain output:

{
    "requestID": "a93670ab-ab52-4dbf-ab71-7b937e172dec",
    "signature": "json",
    "results": [
        {
            "#operator": "Sequence",
            "~children": [
                {
                    "#operator": "IndexScan",
                    "index": "idx_form",
                    "keyspace": "mybucket",
                    "limit": 9.223372036854776e+18,
                    "namespace": "default",
                    "spans": [
                        {
                            "Range": {
                                "High": [
                                    "\"product\""
                                ],
                                "Inclusion": 3,
                                "Low": [
                                    "\"product\""
                                ]
                            },
                            "Seek": null
                        }
                    ],
                    "using": "gsi"
                },
                {
                    "#operator": "Parallel",
                    "~child": {
                        "#operator": "Sequence",
                        "~children": [
                            {
                                "#operator": "Fetch",
                                "keyspace": "mybucket",
                                "namespace": "default"
                            },
                            {
                                "#operator": "Filter",
                                "condition": "((`mybucket`.`form`) = \"product\")"
                            },
                            {
                                "#operator": "InitialGroup",
                                "aggregates": [
                                    "count(*)"
                                ],
                                "group_keys": []
                            }
                        ]
                    }
                },
                {
                    "#operator": "IntermediateGroup",
                    "aggregates": [
                        "count(*)"
                    ],
                    "group_keys": []
                },
                {
                    "#operator": "FinalGroup",
                    "aggregates": [
                        "count(*)"
                    ],
                    "group_keys": []
                },
                {
                    "#operator": "Parallel",
                    "~child": {
                        "#operator": "Sequence",
                        "~children": [
                            {
                                "#operator": "InitialProject",
                                "result_terms": [
                                    {
                                        "expr": "count(*)"
                                    }
                                ]
                            },
                            {
                                "#operator": "FinalProject"
                            }
                        ]
                    }
                }
            ]
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "4.100504ms",
        "executionTime": "4.043493ms",
        "resultCount": 1,
        "resultSize": 3026
    }
}

geraldss · April 17, 2016, 9:30pm

Please switch to the 4.5 Developer Preview or (better) the upcoming 4.5 Beta. This exact use case has been much improved.

moon0326 · April 18, 2016, 3:15am

We cannot afford using a DP or Beta on production.

Is it ever going to be fixed (is this even a bug?)? If not, are there any other alternatives?

This is a MUST for pagination feature.

rafael.felix · April 18, 2016, 11:07am

Hi there,
which version of couchbase are you using?
I have the same problem, the 4.5 DP has a really great improvement like geraldss said.
The count queries get a 87% time improvement (in my system).
But like you said, not production ready.
You can try the Couchbase Server 4.1 version, this version has covering indexes that will improve the response time, but it’s Enterprise Edition only for now.

geraldss · April 18, 2016, 3:33pm

Not a bug, but 4.5 has performance improvements for this. If you must use a GA version, then it should be 4.5 GA.

moon0326 · April 18, 2016, 6:21pm

Thank you @rafael.felix

I appreciate your input. We’ve been testing with 4.0 for while, but we will consider 4.1 enterprise license once we are 100% close to the production version. It’s sad that a query engine has so slow performance for such a simple use case.

moon0326 · April 18, 2016, 6:23pm

Are there any other alternatives for this use case? Probably it is only me, but select count(*) is most basic and widely used. I’m not sure how it is normal for it to take 2+ seconds. We will try out 4.1 or 4.5.

geraldss · April 18, 2016, 7:05pm

Yes, this is all fixed, in addition to other features for pagination (e.g. sortCount in the metrics for ORDER BY queries). I would recommend that you evaluate 4.1 or 4.5 for production, and when satisfied, then you can deploy it when ready. It is not useful to evaluate 4.0 for a future 4.x production deployment.

Thanks,
Gerald

vsr1 · April 21, 2016, 6:04am

In 4.1, Alternative for this use case is covered index with count(1)

Example:

create index ix1 on bucket(aField);
select count(1) from bucket where aField=fieldValue;

moon0326 · April 21, 2016, 7:00am

Thank you. It looks like it is much faster on 4.1.

moon0326 · April 21, 2016, 6:34pm

A little update.

We’ve been inserting documents and we now have over 4,000,000 documents.

It was faster (I think) yesterday when we had less number of documents.

It still takes 2+ seconds with cover index.

Explain:

cbq> explain select count(1) from bucket where form='something';
{
    "requestID": "9b666f80-7e88-4806-b016-54616ecdf99d",
    "signature": "json",
    "results": [
        {
            "#operator": "Sequence",
            "~children": [
                {
                    "#operator": "IndexScan",
                    "covers": [
                        "cover((meta(`bucket`).`id`))",
                        "cover((`bucket`.`form`))"
                    ],
                    "index": "idx_creative_form",
                    "keyspace": "bucket",
                    "namespace": "default",
                    "spans": [
                        {
                            "Range": {
                                "High": [
                                    "\"something\""
                                ],
                                "Inclusion": 3,
                                "Low": [
                                    "\"something\""
                                ]
                            }
                        }
                    ],
                    "using": "gsi"
                },
                {
                    "#operator": "Parallel",
                    "~child": {
                        "#operator": "Sequence",
                        "~children": [
                            {
                                "#operator": "Filter",
                                "condition": "(cover((`bucket`.`form`)) = \"something\")"
                            },
                            {
                                "#operator": "InitialGroup",
                                "aggregates": [
                                    "count(1)"
                                ],
                                "group_keys": []
                            }
                        ]
                    }
                },
                {
                    "#operator": "IntermediateGroup",
                    "aggregates": [
                        "count(1)"
                    ],
                    "group_keys": []
                },
                {
                    "#operator": "FinalGroup",
                    "aggregates": [
                        "count(1)"
                    ],
                    "group_keys": []
                },
                {
                    "#operator": "Parallel",
                    "~child": {
                        "#operator": "Sequence",
                        "~children": [
                            {
                                "#operator": "InitialProject",
                                "result_terms": [
                                    {
                                        "expr": "count(1)"
                                    }
                                ]
                            },
                            {
                                "#operator": "FinalProject"
                            }
                        ]
                    }
                }
            ]
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "9.888334ms",
        "executionTime": "9.841021ms",
        "resultCount": 1,
        "resultSize": 2890
    }
}

Execution:

{
    "requestID": "8bcb51c6-b556-4b17-9aef-07b5b26b73b5",
    "signature": {
        "$1": "number"
    },
    "results": [
        {
            "$1": 336068
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "2.242253945s",
        "executionTime": "2.242218257s",
        "resultCount": 1,
        "resultSize": 36
    }
}

It looks like we HAVE to move to 4.5 in order to get faster response or am I missing something with my query?

vsr1 · April 21, 2016, 6:53pm

Nothing missing in query and the plan looks optimal for 4.1

geraldss · April 21, 2016, 7:20pm

In 4.5, this COUNT is pushed down to the indexer for better performance.

moon0326 · April 22, 2016, 8:54am

@geraldss I understand that it is best to test 4.5 on our end, but how faster is it compare to 4.1? How was it on your internal benchmarks? Is this at least 100%+ faster in 4.5? It takes at least 2+ seconds for such a simple count query, which usually takes less than 300ms in RDMS. Unless it is at least 100%+, it is still not usable for even a simple pagination use case in my opinion. Should I look for other solutions such as view for aggregation or is 4.5 on par with RDMS?

geraldss · April 22, 2016, 2:43pm

COUNT, LIMIT, and pagination in general are much faster in 4.5. The Beta will be out shortly.

moon0326 · April 22, 2016, 5:03pm

I understand that. Do you have benchmark on them? If it is not fast enough, I think it’s best for us to start using views as soon as possible instead of waiting for 4.5 to come out.

geraldss · April 22, 2016, 5:27pm

You can use views in 4.5. I would recommend you test the 4.5 Beta. That way, you will have the flexibility to use views, N1QL, or both for each use case. The benefits are significant. I don’t have an exact timing for counting 4M documents, but it should be on par.

moon0326 · April 24, 2016, 1:32am

Hi @geraldss

I installed 4.5 DP locally and inserted 621027 documents with the following content.

{
    "form": "test"
}

Then created primary index and index on form with the following query.

create primary index on default using gsi
create index idx_form on default(form) using gsi

Explain output

{
    "requestID": "bd1b9c55-e517-4227-810e-7486c7d02d0d",
    "signature": "json",
    "results": [
        {
            "#operator": "Sequence",
            "~children": [
                {
                    "#operator": "IndexScan",
                    "covers": [
                        "cover ((`default`.`form`))",
                        "cover ((meta(`default`).`id`))"
                    ],
                    "index": "idx_form",
                    "keyspace": "default",
                    "namespace": "default",
                    "spans": [
                        {
                            "Range": {
                                "High": [
                                    "\"test\""
                                ],
                                "Inclusion": 3,
                                "Low": [
                                    "\"test\""
                                ]
                            }
                        }
                    ],
                    "using": "gsi"
                },
                {
                    "#operator": "Parallel",
                    "~child": {
                        "#operator": "Sequence",
                        "~children": [
                            {
                                "#operator": "Filter",
                                "condition": "(cover ((`default`.`form`)) = \"test\")"
                            },
                            {
                                "#operator": "InitialGroup",
                                "aggregates": [
                                    "count(1)"
                                ],
                                "group_keys": []
                            }
                        ]
                    }
                },
                {
                    "#operator": "IntermediateGroup",
                    "aggregates": [
                        "count(1)"
                    ],
                    "group_keys": []
                },
                {
                    "#operator": "FinalGroup",
                    "aggregates": [
                        "count(1)"
                    ],
                    "group_keys": []
                },
                {
                    "#operator": "Parallel",
                    "~child": {
                        "#operator": "Sequence",
                        "~children": [
                            {
                                "#operator": "InitialProject",
                                "result_terms": [
                                    {
                                        "expr": "count(1)"
                                    }
                                ]
                            },
                            {
                                "#operator": "FinalProject"
                            }
                        ]
                    }
                }
            ]
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "2.889656ms",
        "executionTime": "2.850574ms",
        "resultCount": 1,
        "resultSize": 2884
    }
}

count(1) output:

cbq> select count(1) from default where form='test';
{
    "requestID": "0b7bcf72-9275-4913-9370-561d68e5f0e9",
    "signature": {
        "$1": "number"
    },
    "results": [
        {
            "$1": 621027
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "9.646269496s",
        "executionTime": "9.646205144s",
        "resultCount": 1,
        "resultSize": 36
    }
}

So even 4.5 is not optimized for count() query. 9 seconds is simply not usable. 9 seconds is actually fastest. Avg. is 16 seconds.

I know that it might not be a good idea to compare n1ql to RDMS, but here is output from mysql.

MYSQL: select count(*) from test where form=‘test’;
Output: 284ms

Am I missing something?

Thanks,
Moon

Topic		Replies	Views
Count is taking long time SQL++ query , n1ql	19	5412	March 29, 2016
N1SQl - Count Query is taking more than 2 seconds SQL++	1	714	April 23, 2021
Slow Count Query Using N!QL SQL++	4	988	February 20, 2019
Can I improve COUNT() performance? SQL++	16	2404	November 16, 2016
Count Query With Time Condition - Indexing Problem SQL++ query , n1ql , index	7	1445	January 11, 2018

Simple count(*) query takes 2+ seconds

Related topics