N1QL queries Ops/sec on online production server doesn't exceed 1 Ops/sec

I have developed a Qt/C++ application that uses C client SDK with Couchbase 4.5.1 - on blocking (wait) algorithm.

On development environment where couchbase server on the same application machine the Ops/sec would reach 26 easily.

However when I deployed the couchbase online the Ops/sec would not exceed one.

Maybe I am missing some configuration for the server or the SDK ?

You can query the server using cbq shell or using CURL (or another HTTP) using the N1QL REST API. This will help you isolate server vs. client issues. See the docs for cbq shell and N1QL REST API.

Dear @geraldss, Thank you for your reply. I will consider these options for our next release.

Currently I found out that I was using lcb_wait excessively. However when I reduced calling lcb_wait the wait/block time hence is reduced, thus i started receiving this error for some N1QL quires:

“Couldn’t retrieve item: Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout”

How to resolve this issue?

Adding @ingenthr to help with this.

Working through things in order, let me address your question on tuning and throughput, then the timeout you are observing.

First, regarding tuning, you shouldn’t need to do any special tuning. If you need synchronous requests for applicaiton logic, that’s probably fine. Otherwise, it may be best to use more instances of lcb_t. We can get more advanced with shared resources if needed, but a first-order solution might just be to pool more instances, if you need to do that.

As far as the change in throughput between development and production, since these are synchronous requests we should determine what’s causing the higher latency between requests. One thought is that the production environment has higher latency between the systems, but the numbers you give have a very large difference. What is different between the development and production environments? Is the latency between the client and the server relatively low? Is the data set the same?

On the timeout you’re seeing, the client side timeout for N1QL queries is 75s by default @spyman114. If you’re not modifying the default timeout and you’re getting a timeout, is it possible that the query you’re issuing takes more time? Can you post the query, an explain of that query, and characterize the data set it’s operating on? It could be that an aggregation on fields that aren’t satisfied by an index explain the bulk of the time, but we’d need more info to be able to help you understand what’s happening.

Assuming it’s synchronous and you went from an average of about 38ms to about 1000ms of latency (based on your throughput), we’d have to account for that increased latency somewhere assuming all other variables hold constant.

Finally, I don’t think calling lcb_wait excessively would cause this, but I should also be clear that libcouchbase is non-reentrant. You indicate it’s synchronous, so I don’t think you are using the lcb_t from multiple threads calling that lcb_wait(), but if you are that’d be a programming error. It’d be great if you could characterize what the program is trying to do too.

@spyman114 did you need any help here? Did you resolve your issue?

@ingenthr Yes Thanks to your explanations and advice, I went for initializing more instances and that resolved the issue.

I am coming from embedded programing background, thus always trying to be careful with resources.
I was having one lcb_t as a static resource to access the database. Whenever I needed to access the database I Get() the instance, I also tried two different methods:

  • The first to init the lcb_t -connection and terminate it for each call - very slow for the production environment 1 op/s.
  • The second to init the lcb_t -connection once with the init of the application never terminating it ,but here I ran into the issue of the time out of certain queries also the responses didn’t come in order of the queries, however was faster in production environment.

I went for trial and error method, I guess because I didn’t find similar or best practice methods for client libraries usage.

I hope in the future one could find demo apps,single/multi threaded synchronous and asynchronous apps.