Hi everyone,
I am new to couchbase analytics (CBA) and was wandering if anyone could help me with performance optimization.
I have N Gb of data in the CBA my test is buil on nodejs and all requests are done with via officail SDK 4.4.4 Hello World | Couchbase Docs
What I do -
I run 100 requests at the same time. and build customer logging of the requests and collect the information of the complete queries via /analytics/admin/completed_requests,
Knowing the ClientContextID I can connect these datasources into one.
On followin screenshot we can see:
Marker |
Source |
Blue dimond |
client sent Request time |
Blue Start |
the requestTime |
Orange Cross |
jobCreateTime |
Green Triangle Right |
jobStartTime |
Green Square |
jobEndTime |
Purple X |
Client end reuest time |
a slight shift in time b\w client an server is present but not crucial for that experiment
This is representation of one batch woth 100 requests
And this is the representation of the test as a whole.
As you can see we have same behaviour of the request across the batches.
If we pay attention to one Batch - the problem is that The jobs are is some sort of queue and are not executing in parallel fashion.
Is there any technics or guidance how sould I move forward with performance optimization?
Is there any low hanging fruets that I should pay attention too?
Many thanks!
1 Like
Each request to the server requires an http connection from start to finish. There is a maximum number of http connections (a pool) for each server node in the sdk (for the java sdk, the default is 12). Once all those connections are occupied, the sdk queues subsequent requests. As the first requests complete and connections becone free, it processes the queued requests. So increasing the number of http connections will allow more concurrent execution . Keep in mind that more concurrently executing requests on the server will require more resources on the server, and may not improve throughput. It may actually degrade throughput.
It’s also worth knowing that there are additional concurrency limits in the Couchbase analytics server. For each query it’ll estimate how much CPU and memory that query will consume, and check those against what is already running on the system. If it decides it can’t run it currently, it’ll queue it up. (This is all a somewhat simplified explanation.) The aim here is to make sure the server has all resources required to execute all concurrent queries, at all times.
IIRC this generally leads to around 3 or 4 queries executing concurrently, so I would guess this is the bottleneck you’re hitting, rather than the SDK 12 limit that Michael mentions.
If I zoom in on your batch graph I see in each batch the first N queries executes in parallel, then roughly the next N, then it smoothes out as variance kicks in. It’s hard to estimate N as the graph is small (3-4 would suggest server, 12 SDK) - but clearly there is a buffer/queue somewhere.
All these settings are tunable though, so you can override both the 12 limit, and make the analytics server more optimistic about how many queries it can run. Of course - this does run the risk of then degrading service, if you overtune and push the server beyond what it can handle. (An analogy here may be overclocking a CPU beyond its factory settings.)
IIRC one of the primary settings for the server here is coresMultipler. Please see Analytics Configuration Parameters | Couchbase Docs for more.
2 Likes
Hi Graham,
Many thanks for your details replay, really appreciate it.
Shareing with the closeup image
the legend is same as before.
As I can understand the Couchbase able to use start 4 jobs at the same time.
And later on the requests are less bundled. (will share the image in next post)
What i can say - we had aquaierd the new server that is X4 more ram, and we will try to run more tests on it.
Hi @alexei.m,
The Analytics query admission policy is described in detail in this blog. By default, Analytics will execute 3 queries concurrently when each node in the cluster has the same number of data partitions as cores. If you’d like more queries to be executed concurrently, you can adjust the parameter coresMultiplier
(assuming there is enough RAM to execute all of them). For example, if you’d like 6 queries to be executed concurrently, you can adjust the parameter as follows:
curl -v -u admin:password -X PUT -d coresMultiplier=6 http://analyitcs_node_ip:8095/analytics/config/service
Then call the Analytics Service restart API for the change to take effect:
curl -v -u admin:password -X POST http://analyitcs_node_ip:8095/analytics/cluster/restart
Note that when more queries are executed concurrently, they will compete for CPU time and as a result, each query might take longer. A better way to achieve higher throughput with same SLA per query is to use nodes with more cores than data partitions.