Hi,
I just noticed that a simple count query takes over 2+ seconds.
select count(*) from bucket where aField=fieldValue
I have index on aField and # of total items is 17000.
Why does it take so long?
Hi,
I just noticed that a simple count query takes over 2+ seconds.
select count(*) from bucket where aField=fieldValue
I have index on aField and # of total items is 17000.
Why does it take so long?
Whats your EXPLAIN saying
EXPLAIN select count(*) from bucket where aField=fieldValue;
Could you output it.
Hello @househippo
Explain output:
{
"requestID": "a93670ab-ab52-4dbf-ab71-7b937e172dec",
"signature": "json",
"results": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan",
"index": "idx_form",
"keyspace": "mybucket",
"limit": 9.223372036854776e+18,
"namespace": "default",
"spans": [
{
"Range": {
"High": [
"\"product\""
],
"Inclusion": 3,
"Low": [
"\"product\""
]
},
"Seek": null
}
],
"using": "gsi"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Fetch",
"keyspace": "mybucket",
"namespace": "default"
},
{
"#operator": "Filter",
"condition": "((`mybucket`.`form`) = \"product\")"
},
{
"#operator": "InitialGroup",
"aggregates": [
"count(*)"
],
"group_keys": []
}
]
}
},
{
"#operator": "IntermediateGroup",
"aggregates": [
"count(*)"
],
"group_keys": []
},
{
"#operator": "FinalGroup",
"aggregates": [
"count(*)"
],
"group_keys": []
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "count(*)"
}
]
},
{
"#operator": "FinalProject"
}
]
}
}
]
}
],
"status": "success",
"metrics": {
"elapsedTime": "4.100504ms",
"executionTime": "4.043493ms",
"resultCount": 1,
"resultSize": 3026
}
}
Please switch to the 4.5 Developer Preview or (better) the upcoming 4.5 Beta. This exact use case has been much improved.
We cannot afford using a DP or Beta on production.
Is it ever going to be fixed (is this even a bug?)? If not, are there any other alternatives?
This is a MUST for pagination feature.
Hi there,
which version of couchbase are you using?
I have the same problem, the 4.5 DP has a really great improvement like geraldss said.
The count queries get a 87% time improvement (in my system).
But like you said, not production ready.
You can try the Couchbase Server 4.1 version, this version has covering indexes that will improve the response time, but it’s Enterprise Edition only for now.
Not a bug, but 4.5 has performance improvements for this. If you must use a GA version, then it should be 4.5 GA.
Thank you @rafael.felix
I appreciate your input. We’ve been testing with 4.0 for while, but we will consider 4.1 enterprise license once we are 100% close to the production version. It’s sad that a query engine has so slow performance for such a simple use case.
Are there any other alternatives for this use case? Probably it is only me, but select count(*) is most basic and widely used. I’m not sure how it is normal for it to take 2+ seconds. We will try out 4.1 or 4.5.
Yes, this is all fixed, in addition to other features for pagination (e.g. sortCount in the metrics for ORDER BY queries). I would recommend that you evaluate 4.1 or 4.5 for production, and when satisfied, then you can deploy it when ready. It is not useful to evaluate 4.0 for a future 4.x production deployment.
Thanks,
Gerald
In 4.1, Alternative for this use case is covered index with count(1)
Example:
create index ix1 on bucket(aField);
select count(1) from bucket where aField=fieldValue;
Thank you. It looks like it is much faster on 4.1.
A little update.
We’ve been inserting documents and we now have over 4,000,000 documents.
It was faster (I think) yesterday when we had less number of documents.
It still takes 2+ seconds with cover index.
Explain:
cbq> explain select count(1) from bucket where form='something';
{
"requestID": "9b666f80-7e88-4806-b016-54616ecdf99d",
"signature": "json",
"results": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan",
"covers": [
"cover((meta(`bucket`).`id`))",
"cover((`bucket`.`form`))"
],
"index": "idx_creative_form",
"keyspace": "bucket",
"namespace": "default",
"spans": [
{
"Range": {
"High": [
"\"something\""
],
"Inclusion": 3,
"Low": [
"\"something\""
]
}
}
],
"using": "gsi"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Filter",
"condition": "(cover((`bucket`.`form`)) = \"something\")"
},
{
"#operator": "InitialGroup",
"aggregates": [
"count(1)"
],
"group_keys": []
}
]
}
},
{
"#operator": "IntermediateGroup",
"aggregates": [
"count(1)"
],
"group_keys": []
},
{
"#operator": "FinalGroup",
"aggregates": [
"count(1)"
],
"group_keys": []
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "count(1)"
}
]
},
{
"#operator": "FinalProject"
}
]
}
}
]
}
],
"status": "success",
"metrics": {
"elapsedTime": "9.888334ms",
"executionTime": "9.841021ms",
"resultCount": 1,
"resultSize": 2890
}
}
Execution:
{
"requestID": "8bcb51c6-b556-4b17-9aef-07b5b26b73b5",
"signature": {
"$1": "number"
},
"results": [
{
"$1": 336068
}
],
"status": "success",
"metrics": {
"elapsedTime": "2.242253945s",
"executionTime": "2.242218257s",
"resultCount": 1,
"resultSize": 36
}
}
It looks like we HAVE to move to 4.5 in order to get faster response or am I missing something with my query?
Nothing missing in query and the plan looks optimal for 4.1
In 4.5, this COUNT is pushed down to the indexer for better performance.
@geraldss I understand that it is best to test 4.5 on our end, but how faster is it compare to 4.1? How was it on your internal benchmarks? Is this at least 100%+ faster in 4.5? It takes at least 2+ seconds for such a simple count query, which usually takes less than 300ms in RDMS. Unless it is at least 100%+, it is still not usable for even a simple pagination use case in my opinion. Should I look for other solutions such as view for aggregation or is 4.5 on par with RDMS?
COUNT, LIMIT, and pagination in general are much faster in 4.5. The Beta will be out shortly.
I understand that. Do you have benchmark on them? If it is not fast enough, I think it’s best for us to start using views as soon as possible instead of waiting for 4.5 to come out.
You can use views in 4.5. I would recommend you test the 4.5 Beta. That way, you will have the flexibility to use views, N1QL, or both for each use case. The benefits are significant. I don’t have an exact timing for counting 4M documents, but it should be on par.
Hi @geraldss
I installed 4.5 DP locally and inserted 621027 documents with the following content.
{
"form": "test"
}
Then created primary index and index on form with the following query.
create primary index on default using gsi
create index idx_form on default(form) using gsi
Explain output
{
"requestID": "bd1b9c55-e517-4227-810e-7486c7d02d0d",
"signature": "json",
"results": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan",
"covers": [
"cover ((`default`.`form`))",
"cover ((meta(`default`).`id`))"
],
"index": "idx_form",
"keyspace": "default",
"namespace": "default",
"spans": [
{
"Range": {
"High": [
"\"test\""
],
"Inclusion": 3,
"Low": [
"\"test\""
]
}
}
],
"using": "gsi"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Filter",
"condition": "(cover ((`default`.`form`)) = \"test\")"
},
{
"#operator": "InitialGroup",
"aggregates": [
"count(1)"
],
"group_keys": []
}
]
}
},
{
"#operator": "IntermediateGroup",
"aggregates": [
"count(1)"
],
"group_keys": []
},
{
"#operator": "FinalGroup",
"aggregates": [
"count(1)"
],
"group_keys": []
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "count(1)"
}
]
},
{
"#operator": "FinalProject"
}
]
}
}
]
}
],
"status": "success",
"metrics": {
"elapsedTime": "2.889656ms",
"executionTime": "2.850574ms",
"resultCount": 1,
"resultSize": 2884
}
}
count(1) output:
cbq> select count(1) from default where form='test';
{
"requestID": "0b7bcf72-9275-4913-9370-561d68e5f0e9",
"signature": {
"$1": "number"
},
"results": [
{
"$1": 621027
}
],
"status": "success",
"metrics": {
"elapsedTime": "9.646269496s",
"executionTime": "9.646205144s",
"resultCount": 1,
"resultSize": 36
}
}
So even 4.5 is not optimized for count() query. 9 seconds is simply not usable. 9 seconds is actually fastest. Avg. is 16 seconds.
I know that it might not be a good idea to compare n1ql to RDMS, but here is output from mysql.
MYSQL: select count(*) from test where form=‘test’;
Output: 284ms
Am I missing something?
Thanks,
Moon