Couchbase 4.1 Community Edition benchmark

Hi all…
We are benchmarking couchbase and observing a very strange behaviour.

Setup phase:

couchbase cluster machines;
2 x EC2 r3.xlarge with General purpose 80GB SSD (Not EBS optimised ) , IOPS 240/3000.
couchbase settings:
Data Ram Quota: 22407 MB
Index Ram Quota: 2024 MB
Index Settings (default)
Per Node Ram Quota: 22407 MB
Total Bucket Size: 44814 MB (22407 x 2)
Replicas enabled (1)
Disk I/O Optimisation (Low)

  • Each node runs all three services

couchbase client;
1 x EC2 m4.xlarge General purpose 20 GB SSD (EBS Optimised), IOPS 60/3000.
The client is running the ‘YCSB’ benchmark tool.

PS: All the machines are residing within the same VPC and subnet.

ycsb load couchbase -s -P workloads/workloada -p recordcount=100000000 -p core_workload_insertion_retry_limit=3 -p couchbase.url=http://HOST:8091/pools -p couchbase.bucket=test -threads 20 | tee workloadaLoad.dat


While everything works as expected
The average ops/sec is ~21000
The ‘disk write queue’ graph is floating between 200K - 600K (periodically drained).
The ‘temp OOM per sec’ graph is at constant 0.

When things starting to get weird
After about ~27M documents inserted we start seeing ‘disk write queue’ is constantly rising (Not getting drained)
At about ~8M disk queue size the OOM failures are starting to show them selves and the client receives ‘Temporary failure’ from couchbase.
After 3 retries of each YCSB thread, the client stops after inserting only ~27% of the overall documents.
Even when the YCSB client stopped running, the ‘disk write queue’ is asymptotically moving towards 0, and is drained only after ~15 min.

When we benchmark locally on MacBook with 16GB of ram + SSD disk (local client + one node server) we do not observe such behaviour and the ‘disk write queue’ is constantly drained in a predictable manner.


Logs would help. If disk isn’t draining this isn’t surprising as we cannot create enough new free pages to receive the new incoming data.
Also what is the exact version you are using? We do not have a 4.1 with community edition so I suspect you may be running some unreleased/untested version.
One other point, I’d not recommend running benchmarks on couchbase server community edition even though it does not impact your insert performance, CE has a number of performance and scale capabilities that is in enterprise edition.

First of all thanks for the quick response.

I apologise regarding the server version it was my typo.
We are testing the Couchbase v4.0 CE.

I totally understand why we receive the "Temp failure"errors from the server, what I don’t understand is the Disk I\O behaviour.
Since I am not authorised to upload attachments what log files could shed some light on what is going on?

Thanks, Eli

simply collect cb_collectinfo (under logs tab) and email me the location at I’ll see if couchbase server isn’t hampered by some other issue.

Hi Cihan
Those are the logs

Thanks in advance, Eli.

Apologies for the delay. I did look at the file. There is evidence that your compaction is having issues. I am not sure why however so need to look at this more. Compaction is the process we go through after some fragmentation has been reached on your files and looks like fragmentation is being kicked off repeatedly and never finish. You can see it here in vbucket 467.
It may be that the IO bandwidth isn’t enough on the nodes to allow us to write as fast as we’d like.

[ns_server:info,2016-01-12T08:53:41.119Z,<0.21598.0>:compaction_new_daemon:spawn_scheduled_kv_compactor:467]Start compaction of vbuckets for bucket test with config:

[ns_server:info,2016-01-12T08:53:41.131Z,<0.21602.0>:compaction_new_daemon:spawn_scheduled_views_compactor:493]Start compaction of indexes for bucket test with config:

[ns_server:info,2016-01-12T08:54:11.122Z,<0.21933.0>:compaction_new_daemon:spawn_scheduled_kv_compactor:467]Start compaction of vbuckets for bucket test with config:

I’ll post Eli’s private response to close the loop:
The issue was compaction was not able to finish as the total disk size was not configured to the recommended multiple (3x) of the raw data size. After increasing the free space on the nodes, compaction was able to proceed.
To do more obvious alerting on low disk space, add the following issue for us to track.


1 Like