Couchbase benchmark: Insert and Get using KeyValue operations very slow compared to Redis

I have made a simple benchmark test, where I compare inserting and fetching data from Couchbase and Redis. The scenario:

  • Insert 100 000 JSON documents in Couchbase as well as to Redis
  • Randomly fetching documents using KeyValue operations for Couchbase and the standard way in Redis


  • Inserting 100 000 objects into Couchbase (code below), takes 60 seconds flat

  • Inserting 100 000 objects into Redis (code below), takes about 8 seconds

  • Randomly fetching a document/KeyValue (average times of 1000 GETs):
    ** Couchbase: about 0.55 ms to 1 ms using KeyValue operations (500 - 600 ms for 1000 GETs)
    ** Redis: about 0.055 ms (50-60 ms for 1000 GETs)

Note: I am using ServiceStack Redis lib to interact with Redis.

INSERT code:

        // I create a List<JObject> jsonObjects above, to not have that measured in the bench
        // ....

        IBucket bucket = await cluster.BucketAsync("myBucket");
        IScope scope = bucket.Scope("myScope");
        var collection = scope.Collection("myCollection");

        List<Task> insertTasks = new List<Task>();
        Stopwatch sw = Stopwatch.StartNew();
        foreach (JObject temp in jsonObjects)
            await collection.InsertAsync(temp.GetValue("JobId").ToString(), temp);
        Console.WriteLine($"Adding {nbr} to Couchbase took {sw.ElapsedMilliseconds} ms"); // <-- 60 seconds
        // Note: I tried to collect all tasks and do a Task.WhenAll(taskList), but it didnt make much of a difference

        using (var client = redisManager.GetClient())
            foreach (JObject temp in jsonObjects)
                client.Set($"jobId:{temp.GetValue("JobId")}", temp.ToString());
        Console.WriteLine($"Adding {nbr} to Redis took {sw.ElapsedMilliseconds} ms"); // <-- 8 seconds

GET code:

        // COUCHBASE
        int lim = 10;
        for (int q = 0; q < lim; q++)
            List<Task> tasks = new List<Task>();
            Stopwatch sw = Stopwatch.StartNew();
            for (int i = 0; i < nbr; i++)
                string key = $"{r.Next(1, 100000)}";
                await collection.GetAsync(key);
           // Note: also tried collecting all tasks and do Task.WhenAll(taskList); no relevant time was saved
            Console.WriteLine($"Couchbase Q: {q}\t{sw.ElapsedMilliseconds}");  // <-- about 600 ms

        // REDIS:
        for (int q = 0; q < 10; q++)
            Stopwatch sw = Stopwatch.StartNew();
            using (var client = redisManager.GetClient())
                for (int i = 0; i < nbr; i++)
                    client.Get<string>($"jobId:{r.Next(1, 100000)}");
            Console.WriteLine($"Redis Q: {q}\t{sw.ElapsedMilliseconds}"); // <-- about 60 ms

Redis is 10 times faster for both Insert and Gets (Gets using KeyValue operations). Ig I use Couchbase Query API, it takes 1500 ms approximately.

I can understand that a Query takes longer; its much more advanced functionality and much more powerful, but, I dont really understand why it has to take 10 times longer when using the KeyValue operations. I would expect Couchbase to be much closer to Redis regarding the executio times.

Can someone shed some light here? Do these numbers seem legit, correct, valid?


Hello @wagger

Sorry for the delay, this is incorrectly tagged as Couchbase Server
There are quite a few observations

Both you Insert and Get, even though you are calling an async operation, it’s more likely behaving like a sync operation.
Can you change your loop to be use Parallel Foreach with some max degree of parallelism set.

Talking about N1QL I cannot tell for sure here however
Are you writing queries against the Key or are you writing against a non-Key field ?
Do you have the right Indexes in place ?

@Richard_Ponton, @jmorris , @btburnett3 any other comments or suggestions ?

The parallelism isn’t the problem. Even if I remove parallelism, it’s slow, extremely slow compared to redis for example.

I am using KeyValue operations on the Reads, so should be indexed and fine? The inserts should however not be affected by that, but are also very slow.

I am engaged in a StackOverflow thread with the same question:

Would it be OK to continue there? I have posted a github url to repo of benchmark code.

1 Like

That should be fine, our team is active on StackOverflow as well.

Thanks, feel free to jump in there :blush:

Unfortunately, I don’t have enough reputation to comment on StackOverflow. However, I’ll add a couple of points:

Your parallelization method for the inserts is a bit inefficient, and probably why you’re seeing so much variability. See this post for some information on optimizing parallel operations: How to do .Get<T>(List keys) in SDK 3.x in C#?. Also, your get approach is also not parallelized in the example, it’s sequential.

However, I agree that these timings don’t make much sense. I typically see sub-millisecond times for handling documents using the key/value API. I’m curious as to a few points:

  • Have you tried strong types instead of JObject? It’s possible our particular serializer implementation for JObject is inefficient.
  • You might try tweaking the minimum and maximum connection pool size. When running large numbers of parallel operations having a larger pool of connections may increase performance. Probably not as valuable running entirely on the local machine, though.
  • Are you running Couchbase inside any kind of VM or Docker container, or directly on Windows? If Docker, are you using a Hyper-V VM or WSL 2?

Thanks for the answer.

But, I think there is way too much focus on the parallelism-part =) In the SO post, I wrote that “Note: Using Task.WhenAll, or awaiting each call, doesnt make a difference” and I re-ran the tests several times to test that. I have now removed the paralleism, and instead this is the code I use:

IBucket bucket = await cluster.BucketAsync("myBucket");
IScope scope = bucket.Scope("myScope");
var collection = scope.Collection("myCollection");

// avoid measuring lazy loading:
JObject t = JObject.FromObject(_baseJsonObject);
t["JobId"] = 0;
t["CustomerName"] = $"{firstnames[rand.Next(0, firstnames.Count - 1)]} {lastnames[rand.Next(0, lastnames.Count - 1)]}";
await collection.InsertAsync("0", t);
await collection.RemoveAsync("0");

foreach (JObject temp in jsonObjects)
    await collection.InsertAsync(temp.GetValue("JobId").ToString(), temp);
Console.WriteLine($"Adding {nbr} to Couchbase took {sw.ElapsedMilliseconds} ms");

With that said:

  • No, I have not tried strongly typed, as in this test, this is a JSON file that does not translate to a POCO, yet. I can change it up and see if it makes a difference.
  • I will skip this part and not run it parallel, but, as I stated and showed in SO, it doesnt make any difference, I get the same exec times with or without the WhenAll approach. Also, Redis, MySql and Aerospike are all non-parallel. I just ran it without WhenAll, so sequential, and I got 95 seconds right now.
  • Couchbase is installed in Win 10 as a Service, running on a Ryzen 3900x 12 core 4.2 Ghz, on a M.2 SSD (not that that should matter in this case)

I have also uploaded the test to GitHub, here:

One more thing:

After I create the scope and collection, like this:

then I run this query, as a standard. Admittedly, I am not sure what this query does, because when I run it, the collection contains 0 documents:

CREATE PRIMARY INDEX ON default:myBucket.myScope.myCollection

After that, I run the code to insert 100 000 rows, as can be seen above and on GitHub.

Hello again!
I re-ran the INSERT code, now using POCOs instead. Unfortunately, it is the same results as before, 101 seconds to INSERT 100 000 documents.

I am uploading the changed code to GitHub, but here are some excerpts:

I added a Model.cs, that has the POCO:


And the line now inserting into Couchbase:

foreach(Root root in pocoObjects)
    await collection.InsertAsync(root.JobId.ToString(), root);

No one is interested in this? =)

I have a small LoadDriver application for the java sdk that uses the kv api. With 4 threads, it shows an average response time of 54 micro-seconds and throughput of 73238 requests/second. It uses the travel-sample bucket on a cb server on localhost.

Run: seconds: 10, threads: 4, timeout: 40000us, threshold: 8000us requests/second: 0 (max), forced GC interval: 0ms
count: 729873, requests/second: 72987, max: 2796us avg: 54us, aggregate rq/s: 73238

For the query API I get the following which is 18 times slower.

Run: seconds: 10, threads: 4, timeout: 40000us, threshold: 8000us requests/second: 0 (max), forced GC interval: 0ms
count: 41378, requests/second: 4137, max: 12032us avg: 965us, aggregate rq/s: 4144


I took your CouchbaseTests, commented out the non-Couchbase bits. Changed (corrected) the query to select from the collection ( myCollection ) instead of jobcache, and removed the Metrics option. And created an index on JobId.
create index mybucket_JobId on default:myBucket.myScope.myCollection (JobId)
It inserts the 100,000 documents in 19 seconds and kv-fetches the documents on average 146 usec and query by JobId on average 965 usec.
Couchbase Q: 0 187
Couchbase Q: 1 176
Couchbase Q: 2 143
Couchbase Q: 3 147
Couchbase Q: 4 140
Couchbase Q: 5 138
Couchbase Q: 6 136
Couchbase Q: 7 139
Couchbase Q: 8 125
Couchbase Q: 9 129
average et: 146 ms per 1000 -> 146 usec / request

Couchbase Q: 0 1155
Couchbase Q: 1 1086
Couchbase Q: 2 1004
Couchbase Q: 3 901
Couchbase Q: 4 920
Couchbase Q: 5 929
Couchbase Q: 6 912
Couchbase Q: 7 911
Couchbase Q: 8 911
Couchbase Q: 9 927
average et: 965 ms per 1000 -> 965 usec / request. (coincidentally exactly the same as with the java api).

This was on 7.0 build 3739 on a Mac Book Pro with the cbserver running locally.

Hi @wagger -

Thanks for posting, I am the lead on the .NET SDK and have been OOO due to the holidays. I definitely am interested :slight_smile:

I’ve been reading through the posts and don’t yet have an answer for the performance your encountering. I’ll investigate and see what I come up with.


1 Like


SDK version 3.1.1 released yesterday, and it has a lot of performance improvements. In some of my testing, with a tweaked version of your benchmark, I was getting almost an order of magnitude reduction in run time on my laptop. I’d be very interested to know how it performs for you in your test environment.

Note that the main tweak I made to your benchmark was to control the degree of parallelism, as per this example in another post: Bulk Insert In Every Two Minutes

1 Like