Couchbase performance (misconfiguration or something else?)


We are currently testing NoSQL technology with real enterprise data. We have already tested Cassandra and MongoDB and we were looking forward to testing Couchbase.

In order to accomplish this, we have already installed a cluster in 4 nodes. Each machine has a dual-core i3 3.2 GHZ, 4GB RAM and a non-ssd drive.

The cluster configuration is as follows:

  • No replication
  • Cluster quota (12.7 GB)
  • Per Node RAM Quota: 3261 MB
  • Total bucket size = 13044 MB (3261 MB x 4 nodes)
  • Disk I/O Optimization: Low (default)
  • Auto-Compaction: force option disabled
  • Flush enabled

Our dataset is 100% write-heavy. There are two nodes which send data to the cluster, with an application which creates 5 processes in one node and 3 in another one. These processes then connect in parallel to CouchBase and send big bulks of data. The experimented is repeated 13 times before reboot and another (bigger) workload being sent.
The application used to insert the data (the one which spawns 8 processes) is written in C# and is using the C# driver for Couchbase. Given that we are inserting large bulks of data, we are using the “BulkLoader”/“StoreHandler” class provided in the documentation for our inserts. The data is sent as a JSON string sequentially using the aforementioned StoreHandler.

After some initial testing we noticed that Couchbase was performing very poorly, and hence we began wondering if we had some problems with our configuration or with our insertion process. We believe that sharding is working, as the web interface seems to show an equal partition of data across all nodes

We were hoping you could help us find out if there really is something wrong with our configuration, and help us solve the issue. We would be very grateful for this.

We can supply any further information about our system in case it is needed.

Best regards

@jorl17 -

A couple of questions:

  • What version of the client are using?
  • What was your expected/actual performance?
  • How big is the document size?
  • Are maintaining the client instance across multiple calls or creating/disposing it on every call?



Hi! Thank you for your fast reply! In response to your questions:

  • We are using the most recent version that comes from NuGet’s package manager in VS 2010.
  • Our performance shows that inserts of 10k take around 8~15s to complete. This gives us a throughput of roughly ~700 ops/sec. This throughput remains (mostly) constant throughout our experiment (that is, with ever increasing data sizes). Another NoSQL solution took roughly 500-800ms to complete 10k inserts, leading to a throughput of around 13300 ops/sec (although this particular NoSQL solution saw its performance degrade significantly faster, though never being as low as ~700 ops/sec, in fact it never went below 9k ops/sec).
  • Regarding the document size, each record has 8 attributes (3 longs, 2 dates, 1 boolean, 1 string and 1 float) + the record ID, which is a UUID string generated by System.Guid.NewGuid().
  • We use a static instance of the aforementioned StoreHandler class. However, our processes spawn only one writing thread (the concurrency comes from different processes), so in practice it being static is not relevant (at least we do think so). Upon opening a connection, we send all the data for this workload and close the connection (and kill the process, as the test run is over).

Best regards, thank you.

@jorl17 -

The most recent GA for 1.X version of the client is 1.3.10; there is also a beta 2 for 2.X on NuGet. I am guessing you are using 1.3.10? (Hint look for ExecuteXXX methodss, where XXX is an operation like Get or Set).

The performance you are seeing more likely a consequence of your architecture and/or the client itself, as opposed to Couchbase Server. The one writer thread means your probably only using one connection at a time, effectively serializing the throughput, but it really hard to say.


@jorl17, What version of Couchbase Server are you testing on? and what are the bucket level settings you have chosen?

@cihangirb From the log files, our version is 3.0.1-1444-rel-enterprise. Regarding our bucket settings, they are the ones we listed in the first post. Maybe we forgot something? If we did, could you point it out, please?

@jmorris We do have ExecuteStore methods. While we are aware that we are serializing the throughput, other solutions we tested always provided us with some batch insert functionality, and maybe internally it used parallelization – do you think we should, then, try to do this ourselves?


thanks - would you also try to make sure you enable full ejection on the bucket. for write heavy workloads it will provide better support for you.
this is under bucket settings and chances the way we manage memory in the system to tune it for write heavy workloads.

@jorl17 -

Yes, here is an example of using the TPL and partitioning:

 internal class Program
    private static CouchbaseClient Client = new CouchbaseClient();

    private static void Main(string[] args)
        var items = new Dictionary<string, string>();
        for (int i = 0; i < 100000; i++)
            items.Add("key" + i, "value" + i);
        var results = ParallelSet(items, 8, 1000);

    private static ConcurrentDictionary<string, IOperationResult> ParallelSet<T>(IDictionary<string, T> items, int maxParallism, int rangeSize)
        var keys = items.Keys.ToList();
        var results = new ConcurrentDictionary<string, IOperationResult>();
        var options = new ParallelOptions {MaxDegreeOfParallelism = maxParallism};
        var partitioner = Partitioner.Create(0, items.Count, rangeSize);
        Parallel.ForEach(partitioner, options, (range, loopstate) =>
            for (var i = range.Item1; i < range.Item2; i++)
                var key = keys[i];
                var value = items[key];
                var result = Client.ExecuteStore(StoreMode.Set, key, value);
                results.TryAdd(key, result);
        return results;

Using this I was able to achieve 40K OPS running CB on localhost, which pretty much removes network latency.

You will need to tune the maxParallism and range to your specific machine and needs to get the best performance.

Note that the client also has “Bulk” methods, but I am pretty sure this will give better performance.