Improving document retrieving velocity

sovente · September 8, 2016, 7:43am

We have actually 2 nodes of Couchbase with 16GB of ram and 4 cores. We are planning to switch to 3 nodes of Couchbase, all with 32GB of ram. We made some benchmark tests and we discovered that the performance of the get operations (with multiple ids) were the same no matter how much ram Couchbase server and/or buckets they had. This let us think that probably the amount of ram we have is enough to cache the most used keys we have, so more ram doesn’t impact performances.

All our operations are executed by chunking the keys to retrieve into a batch of documents, actually 2500, because getting more documents the script executed on the application server crashes with the error Client Side Timeout. How can we avoid this error and speed up the retrieve operation?
Is there a limit on the maximum number of documents retrieved by the (multi) get operation? Or is it related to the size of the cluster on which Couchbase is running?

The documents used on the test have a size of about 2kb and a key of about 15 bytes.

subhashni · September 9, 2016, 1:03am

Hi @sovente,

Some questions for you to get a better picture

What is the version of couchbase server?
Which sdk are using for the benchmark tests? (Can you share the benchmark code)

There may be a limit in the underlying sdks architecture on the number of requests to be batched. Client side timeout can be caused a number of reasons - network issues, server response was slow, client issue… We can better understand the issue if we know the answers for the above questions.

Cheers,
Subhashni

drigby · September 9, 2016, 8:11am

Are you interested in maximising throughput or latency?

If it’s the former then you can essentially batch as large as you like, but consider that you’ll get diminishing returns after a while (document size / environment -dependent).

If it’s the latter then you basically don’t want to batch at all (i.e. request individual documents), and do it across a number of independent connections - that way one document taking a bit longer to come back doesn’t affect other requests.

sovente · September 12, 2016, 7:27am

Thank you @subhashni, we are using Couchbase 4.1 CE and PHP SDK 2.2.0 (PHP 7.0.9-1) with the following code using PHP generators inside a foreach:

$chunkedIDs = array_chunk($IDs, $multiGetSize);
foreach($chunkedIDs as $chunk){
  try{
    $response = $bucket->get($chunk);
  }catch(\Exception $e){
    throw new StorageException("Error: ".$e->getMessage());
  }
  $keys = array_keys($response);
  $keysCount = count($keys);
  for($i=0; $i<$keysCount; $i++){
    $key = $keys[$i];
    $value = &$response[$key];
    yield $key => $value;
  }
}

We tried to execute more concurrent scripts but the overall throughput does not change across 2 or 3 nodes. So, what is the right way to maximize throughput? What does the number of nodes and the ram available afftect?

sovente · September 12, 2016, 7:31am

Hi @drigby, we are interested in maximising throughput, how can we determine the optimal batch size? Is it influenced by the total ram or number of nodes in the cluster?

Having about 2kb document size is correct? Or is it better to have smaller/larger documents to optimize performances?

Thank you!

drigby · September 12, 2016, 7:57am

It’ll depend on many factors - in addition to what you mentioned it’ll also depend on the network latency between your clients & cluster.

I suggest you benchmark your application / environment with a variety of batch sizes and see what works best for you.

sovente · September 12, 2016, 12:44pm

Thank you @drigby, we’ll benchmark our infrastructure every time we need to change it.

subhashni · September 13, 2016, 1:43pm

Thanks @sovente. There is a couchbase guide for sizing http://developer.couchbase.com/documentation/server/current/install/sizing-general.html. If more of your dataset fits in memory, throughput will be better. @avsej here can help you with the best practices for php.

sovente · September 15, 2016, 6:17am

Great guide, it was what we were missing. We are now able to benchmark our infrastructure to maximize throughput.

Thank you very much for the help!

Topic		Replies	Views
Getting 250 items takes 2 seconds Couchbase Server	2	1886	March 19, 2015
Performance got very slow with a lot of data Couchbase Server	6	5169	May 14, 2015
Can couchbase work if not all documents in DB can be stored in memory? Couchbase Server	8	2808	June 8, 2016
Cache Misses Couchbase Server	4	2805	November 25, 2013
Retrieving missed in cache keys performance Couchbase Server	4	2151	June 23, 2013

Improving document retrieving velocity

Related topics