N1QL 200x slower than views for simple queries

Query need to have predicate need on leading key.
CREATE INDEX idx_val ON Fitness( val ) WHERE type = ‘fitsession’;
select val from Fitness where type=‘fitsession’ AND val is NOT MISSING.

Thanks was finally able to run the test again. It improved but it is still night and day compared to views.
Now it goes from 2’’ to 25’’. And to make thing worse I noticed the previous data wasn’t even on 200K docs, that was a previous test. Now there is just 3000 docs inside and I am firing 100 parallel queries with the above results.
Views in consistent at 200-400ms even if I leave them in development.
By the way the results are using adhoc false, and I am firing two consecutive batches of queries to give N1ql time to cache something, on the first run it’s even worse.
Here is the explain :
[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“covers”: [
“cover ((Fitness.val))”,
“cover ((meta(Fitness).id))”
],
“filter_covers”: {
“cover ((Fitness.type))”: “fitsession”
},
“index”: “idx_val”,
“index_id”: “2af99e8ad6e69d7a”,
“keyspace”: “Fitness”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“Inclusion”: 1,
“Low”: [
“null”
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “((cover ((Fitness.type)) = “fitsession”) and (cover ((Fitness.val)) is not missing))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “cover ((Fitness.val))”
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
},
“text”: “select val from Fitness where type=‘fitsession’ AND val is NOT MISSING”
}
]

There is nothing else we can do?

I made further tests. There was one mistake in the view benchmark it was still using the development view, although I had promoted the view to production the dev view was still there and I was using that.
Still even like this views performance is very stable. On that 3K it does 300milliseconds-1 second.
I wanted to try 4.6 to see if things improved and it’s much better, it’s executing at 2seconds-12seconds, but still miles far from views.

I tried another test, this time with really 100K docs. If I fetch all of them it takes forever. So I made a query like this :
select val from Fitness where type=‘fitsession’ AND val is NOT MISSING AND val > 28
And on the view I start from key 28. The query return around 10K docs.
With N1QL it takes 4seconds-30seconds with views 200milliseconds-4 seconds.

Then I noticed something interesting. With views my network process around 2MB/s of data, with N1QL 8-9MB/s. Are you transfering the whole doc although I only asked for the index?

By the way in your internal tests do you see N1QL faster than views without relying on thousand GB of ram?

I don’t understand the numbers / symbols you are using. What are all the numbers in seconds and milliseconds?

I edited the above post to make it more clear.

Try the following on 4.6.

In cbq shell

set -pretty=false
set -max-parallelism=16

Then issue

SELECT RAW val
FROM fitness
WHERE type = 'fitsession' AND val > 28;

This could be on the right track, but I can’t benchmark it. If I try to use this query on node js the sdk crash on parsing the response.
Also I have no way of setting the above on the node sdk. On cql it works fine but can’t test the performance.

You cannot do any of these benchmarks using an SDK. If you want to compare views vs. N1QL, you need to compare them directly, without any additional variables. That probably explains all your previous results.

Since ultimately an sdk is what everybody do use in production it does make sense to benchmark with it.
If really the sdk is the issue, even if N1QL can really perform equal or better than views on a command line utility what’s the point? Nobody would be able to benefit from it.
And from what I see node js is one of the most frequently updated sdk and reasonably since node js is probably the most common platform.
Anyway like I said before I doubt it’s an sdk issue, since I can fire a query from cbq or web interface while the benchmark is running and I can see the query taking forever( no need to even time it, It’s clearly taking much more than 3 seconds).
That is unless the sdk isn’t overloading the server without reason doing something it shouldn’t.

@brett19 Could you comment on this issue? Other than the above missing parameter does the sdk transfer more data then it should when using N1QL?Why am I seeing 4-5x more network usage when running queries that should have same results?

Fair enough. Of course, you as a user should not be doing our benchmarking for us. Thank you :slight_smile:

But if you want a more precise analysis, you need to do some isolation. SDK isolation is the first part of that.

Unless you do have an alternate way I don’t know how to avoid an sdk.
I could reproduce the benchmark with REST API, but ultimately it would rely on REST implementation so it wouldn’t be different than using an sdk.
Btw, I just printed the output of the query and although N1ql use 4 time the bandwith it’s actually returning less data.
This is a view response:
{“key”:28,“value”:null,“id”:“fit#56179d00fcc00000”}
this is a N1QL response
{28}

Something shady going on here.
Anyway I am benchmarking not for pure comparison but ultimately to decide on what to use in production.
Views are much more limited in use, so I would like to use N1QL but not at this price.

Can you test this using cbq shell with 4.6, just to get some numbers. I posted the settings and query earlier.

@geraldss

“executionTime”: “505.457707ms”,
“resultCount”: 9617,
“resultSize”: 13234
}
And this is the same query fired while the benchmark was running.
“status”: “success”,
“metrics”: {
“elapsedTime”: “2.75789652s”,
“executionTime”: “2.757850431s”,
“resultCount”: 9617,
“resultSize”: 13234
}
}

Please note that it didn’t actually take 2.7 seconds to answer the query, more like 10. I just noticed the query seem to be put on stall, like if it’s waiting for the others to finish first. If I input a query with an error on cbq while the benchmark is running it will pass several seconds and then answer:
“status”: “fatal”,
“metrics”: {
“elapsedTime”: “355.032µs”,
“executionTime”: “322.014µs”,
“resultCount”: 0,
“resultSize”: 0,
“errorCount”: 1
}

While it clearly didn’t take 355.032µs to answer.
But I can run view queries just fine while the 100 n1ql queries are running. So the n1ql is definitively capped somewhere.

Did you apply the settings?

set -pretty=false
set -max-parallelism=16

Can you make sure you are not running any node.js processes on the same machine where query engine is running.

@geraldss
Yes I did. The first result is pure cbq alone and it did actually complete in 500 milliseconds.
The second is while the benchmark in node was running and mimic a real load scenario which n1ql doesn’t seem to cope with.
I reinstalled everything to make sure it wasn’t an issue with my installation. Same result.
I was curious and tried n1ql with the new memory optimized index, hell even with everything in ram it perform horrible like above.

So I made another try, and moved everything to my setup on the cloud. Fresh machine just installed, better results for both view and n1ql but the difference between them is roughly the same, maybe slightly better than my machine but we are roughly in the same ballpark.

I decided to give it another last chance, this is really the last. I moved the node js client and the couchbase server to 2 different servers, both quad cores with 4gb of ram. Identical results again.
So it cannot be my setup, maybe with a lot of changes and much higher resources I can make it on par with views but right now I will avoid n1ql.

Futhermore if it wasn’t enough the n1ql did maximize the cpu usage, while views peaked at 50%.


So it seem to me n1ql doesn’t handle concurrency well.

Will you be running EE or CE in production?

If EE, you should test with a separate node for query service. If you have three nodes, you can do (1) client, (2) data + index, (3) query. If you have four nodes, you can separate data and index.

These options are not available in CE.

The test was done with EE. Doesn’t seem a fair comparison, comparing 1 machine against 2 to be honest.
To be fair I should compare a
1 cluster with 2 data nodes with for views
and
1 cluster with 1 data node and 1 query node for n1ql. But then the query node will have to go through network to talk with the index node. So don’t think again will win.

But at this point unless you can provide some internal benchmark which will show n1ql be on par with views ( on similar setups, of course ten 12 core machines 64gb ram can beat one dual core 2gb )I don’t feel like continuing this, as so far it has always proven far worse in all scenarios including the in memory index which should have won by far, but even comparing ram and disk it did lag behind.
We will design the architecture using key value only and sporadic views.

I understand. For production use at scale, N1QL was designed so that data, index, and query services run on separate nodes. Otherwise, they all experience resource contention. Key-value + views is a good alternative if you don’t need the rest of the N1QL functionality.

1 Like

If your resultset is large, please use change the batch size to higher value.
We have additional fix to improve data transfer rate from query service to client in Couchbase release 4.6. Thanks.

curl -u Administrator:password http://localhost:8093/admin/settings >z.json
Edit the z.json & increase servicers Example:

{"completed-limit":4000,"completed-threshold":1000,"cpuprofile":"","debug"
:false,"keep-alive-length":16384,"loglevel":"INFO","max-parallelism":1,"me
mprofile":"","pipeline-batch":16,"pipeline-cap":512,"request-size-cap":671
08864,"scan-cap":0,"servicers":32,"timeout":0}

    CHANGE the relavant values.  
    E.g., Your queries are getting large number of documents for each query.
    Increase pipeline-batch to 1024 and pipeline-cap: 4096

    Do the following:

    curl -u Administrator:password http://localhost:8093/admin/settings
    -XPOST -d@./z.json
2 Likes

Thanks! I have integrated this rest call to my setup script now.
However it didn’t improve performance. I think based on the graph above It’s getting cpu bound which doesn’t happen with views, so it’s using many more resources.