Couchbase 4.1 N1QL - perfomance issue

Oren_Deri · January 31, 2016, 11:44am

Hi,

We would like to based our future product on couchbase - but no matter what we are doing the query perfomance is very slow…

Im running a single node on a server with 40 logical processors and 128 GB with fast HDDs.
I have a single bucket with only 300K documents (our target is 10B per customer). Each document contains 20 integers and 10kb image file (as byte array).

I setup primary and secondary indexes:

CREATE PRIMARY INDEX #primary ON suspectentity USING GSI
CREATE INDEX IndexBasic ON suspectentity(time,channelId) USING GSI
I setup 20GB RAM for bucket and 100GB RAM for indexes.

I query only by time range and channelId with max results of 10k documents. for example:
SELECT * from suspectentity where time >= ‘2016-01-31T09:00’ and time <= ‘2016-01-31T13:00’ and channelId=12 LIMIT 10000;

The EXPLAIN shows that CB uses the correct index scan:
cbq.exe> EXPLAIN SELECT * from suspectentity where time >= ‘2016-01-31T09:00’ and time <= ‘2016-01-31T13:00’ and channelId=12 LIMIT 10000;
{
“requestID”: “550a42a7-1691-40af-9956-de92d0fd746e”,
“signature”: “json”,
“results”: [
{
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “IndexScan”,
“index”: “IndexBasic”,
“keyspace”: “suspectentity”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“2016-01-31T13:00"”,
“12”
],
“Inclusion”: 3,
“Low”: [
"“2016-01-31T09:00"”,
“12”
]
}
}
],
“using”: “gsi”
},
{
“#operator”: “Parallel”,
“~child”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Fetch”,
“keyspace”: “suspectentity”,
“namespace”: “default”
},
{
“#operator”: “Filter”,
“condition”: “(((“2016-01-31T09:00” \u003c= (suspectentity.time)) and ((suspectentity.time) \u003c= “2016-01-31T13:00”)) and ((suspectentity.channelId) = 12))”
},
{
“#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
},
{
“#operator”: “FinalProject”
}
]
}
}
]
},
{
“#operator”: “Limit”,
“expr”: “10000”
}
]
}
],
“status”: “success”,
“metrics”: {
“elapsedTime”: “17.0171ms”,
“executionTime”: “17.0171ms”,
“resultCount”: 1,
“resultSize”: 2898
}
}

The query time is more then 60 seconds (for 10K results set), while on MS SQL Server we are getting results very fast (1-3 seconds). Please note that if I TAKE only 10 raws (and not 10K) then the results are 1-2 seconds.

Can anyone please give us support in order to understand the problem or maybe tell us that this is the expected perfomance.

Oren_Deri · February 1, 2016, 4:37pm

No one? Please help me with this issue

cihangirb · February 1, 2016, 11:54pm

@Oren_Deri, apologies for the delay.
There are a few things you need to know when comparing single node deployment to relational databases. I have posted a similar response here.

A few additional questions on your query;

Have you looked at the max_parallelism setting. To utilize the “Parallel” operator, you need to set your parallelism using the following flag; http://developer.couchbase.com/documentation/server/4.0/n1ql/n1ql-rest-api/executen1ql.html
Do you need all attributes with “select *” or can you cover this query by including a few more attributes in in the index from the documents?

thanks
-cihan

Oren_Deri · February 2, 2016, 8:12am

Thanks for your support.

Insted of select * im using specific attributes.
I run the same query:
SELECT id, channelId, time, isCdbSource, metadata FROM suspectentity where time >= ‘2016-02-01T09:00:00.0000000Z’ and time <= ‘2016-02-01T09:50:59.9900000Z’ and channelId = 12 LIMIT 10001;

it took 70s with LIMIT 10001 (returning ~8000 results)
it took 14s with LIMIT 1000
it took 2s with LIMIT 100

I’m using cbq.exe and my app uses the .Net SDK - but I do not understand how to setup the max_parallelism parameter (can you please specify how?).

Is there a way to profile a query run and see if I hit CPU, RAM or DISK limitation?

Oren_Deri · February 2, 2016, 9:22am

@cihangirb, I managed to change the max_parallelism parameter via the .Net SDK.
When running a single query it provides much better results 30s (insted of 70s)
When running 5 parallel queries the performance are the same (70s)

Still 30s to get 10k documents is very poor and not feasable for our system.

We are really want to use couchbase as our primary data store.

Any other ideas?

cihangirb · February 2, 2016, 2:51pm

Performance is always complex so we need to iterate a few times to make sure we can reach your final target.

A few more suggestions @Oren_Deri;

Lets work on your final target workload: are you expecting to run 5 concurrent executions or will you be putting a load that exercises both mutations and queries at the same time?
Use covering indexes; To cover the query, pls use the following index;
CREATE INDEX IndexBasic ON suspectentity(time,channelId, id, isCdbSource, metadata) USING GSI;
this will cover the query with the index. it you don’t see improvement, pls send me the explain and I can help make sure it is covering the query.
I’d recommend you separate out query, index and data services. these components compete with each other on the same node and can step over each other. There will be a much greater degree of variance in query performance with all services on 1 node.
I assume you are on 4.1 enterprise edition, is that right? community does not perform well due to limitations.

thanks
-cihan

geraldss · February 2, 2016, 3:54pm

Hi @Oren_Deri,

Are you doing a sort – ORDER BY + LIMIT?

If so, you will see significant speedup in the next release; the Developer Preview will be out soon. Please see the ticket here: Loading...

We want to be your primary datastore. We will work with you to get the expected performance.

Thanks,
Gerald

Oren_Deri · February 2, 2016, 4:04pm

I do only LIMIT 10000. I do not use ORDER BY. Thanks for your support. I can provide any infomation needed.

geraldss · February 2, 2016, 4:21pm

Ok, there a couple of things going on here. First, you should try to get a covering index, as pointed out by @cihangirb.

Second, the N1QL query engine is not as aggressive as it could be in pushing down LIMIT to the index scan. This is because we wanted to exploit parallelism instead. Your query above is the most adverse for us, because it is only a LIMIT. That is a good reason for us to revisit the pushdown and try to be more aggressive.

Oren_Deri · February 2, 2016, 4:24pm

@cihangirb thanks for your help

Some more infomation:

My target is to run 10-20 concurrent queries. Each query should return up to 10K documents (LIMIT 10000).Each query will have a different WHERE statement, but all on the same bucket. I have only a single bucket and all document have the same structure/schema. The final target is to have 10B documents, but I have perfomance issues with 1M, so I didn’t tried more then that.
I use covering indexes. One for time and channelId and one for the rest of the properties:
CREATE INDEX IndexBasic ON suspectentity(time,channelId);
CREATE INDEX IndexMetadata ON suspectentity(metadata.clothingShirtColor, metadata.clothingPantsColor, metadata.movementDirection, metadata.height);
When merge it into a single index the performance was worst, so I ended up with this two indexes.
questions regarding indexes: why do I need the “id” in the index? its the primary key of the document.
Note that each document also contains a binary blob that represent image (~10kb size). Some queries ask for the image and some just query integer based properties. So in many cases we will need to hit the DISK to get the results.
Do you think that for only 1M documents with 128GB RAM and 40 logical processors I need 3 seperate machines? This will make our solution very expensive.

if you need any additional infomation please let me know.

Oren_Deri · February 2, 2016, 4:48pm

Please note that currently I’m just testing query that filter by time range and channelId, since this is the most popular query. Such query should return up to 10K documents. This query sometimes return the image binary blob which is part of the document content.

I gave the bucket 20GB RAM and the index 100GB RAM.

cihangirb · February 2, 2016, 8:43pm

@Oren_Deri - thanks for the details. Could you email me at cihan@couchbase.com so we can work through the details of your goals.
I have a few asks;

What is the size of each document? I can provide better sizing for both 1M and 10B items count on the cluster.
For query tuning; lets work on 1 query at a time. if you can give me the queries you’d like to run over email, we can look at indexing in each case and find the best balance of indexes.
BTW, The reason I added id in the mix is I saw you refer to the id in your SELECT list in the previous query. [quote=“Oren_Deri, post:4, topic:6707”]
SELECT id, channelId, time, isCdbSource, metadata FROM suspectentity where time >= ‘2016-02-01T09:00:00.0000000Z’ and time <= ‘2016-02-01T09:50:59.9900000Z’ and channelId = 12 LIMIT 10001;
[/quote]

To answer your question on the last post;

For the deployment model to 1 server vs 3. I’d recommend that you test on what you plan to deploy. As HA and multi nodes get into the picture, the performance characteristics will be different. It is also hard for me to tell if the reason for the query latency is due to contention among services. Separation of services would give you a clearer picture. we can look at system stats to tell where we are bottlenecked but we have a few options to move things around to give you better throughput.

Also @geraldss is right about the limit optimization. We have seen order+by + limit optimization that is in the coming release make a huge impact on some existing cases. Would be great to try some of this on our new build as well.
thanks
-cihan

Oren_Deri · February 3, 2016, 10:58am

@cihangirb - I just sent you an email with the requested infomation. Many thanks

raphhaell · August 30, 2016, 11:39pm

Hi @cihangirb, I have just send you an e-mail to cihan@couchbase.com. I’m having a similar problem like @Oren_Deri. I think you can help me. I did not posted here because I would repeat the same issue, and I think it would be better ask you in a private way. Thanks.

geraldss · August 31, 2016, 8:28pm

Hi @raphhaell,

Please post a new issue fully and we can help you.

Topic		Replies	Views
Perfomance on CB 4 Couchbase Server	4	1732	January 21, 2016
Very low performance and 100% CPU usage when running simple N1QL query by multiple workers Couchbase Server query , n1ql , server	16	6900	September 15, 2016
N1QL: performance and advanced features SQL++	3	1795	September 6, 2016
N1ql very low performance?! SQL++ query , n1ql , dot-net	2	1806	March 13, 2016
Bad performance with N1QL SQL++	3	1299	January 24, 2017

Couchbase 4.1 N1QL - perfomance issue

Related topics