.NET SDK Performance

To give a little background, we’re upgrading from a very old version of Couchbase (2.1.X) to Couchbase 5.5 (.Net Client 2.6.0), and we’ve noticed a significant performance impact to our application servers following the upgrade to use the latest .net client. Our service handle batches of sets and gets (mostly gets) that range from anywhere from 3 to 600+ on average in a batch. We used a memcache bucket in the past, but we’re trying to move to an Ephemeral bucket on the new cluster. When upgrading to the latest .NET client, we used the GetDocumentAsync and UpsertAsync methods to handle these batch get/set requests using something along the lines of the following:

var result = Task.WhenAll(fileKeys.Select(key => getBucket(iEncryptor).GetDocumentAsync<Entity>(key)));
result.Wait(); // In this case their used immediately after, and according to what I read the async methods are the way to batch send requests over the network

&

    var result = Task.WhenAll(
        writeDictionary
        .Select(pair => getBucket(iEncryptor).UpsertAsync<Entity>(pair.Key, pair.Value))
    );
    result.Wait();

We’ve run several small performance tests, which end up being approx 1500 gets and 350 sets per second to our couchbase cluster, and the results are as follows:

Pre-Upgrade:
35% CPU on our .NET application server, and an average response time of 183ms for the test.

Post-Upgrade:
65% CPU on our .NET application server, and an average response time of 310ms for the test.

We’ve checked to ensure we’re re-using the same bucket, tried to increased the MaxSize on the pool (non-ssl -> so shared mux pool by default), and switched back from an Ephemeral to a memcache bucket, but all had no affect.

Based on the research I did, it looks like the async/await overhead might the culprit since each individual task is small (<1 ms typically to our cluster), and we’re dumping large batches into the generated task state-machine. I was planning to try using sync methods but across parallel threads when I found the ‘obsolete’ methods in the .net client that basically implements this already:
getBucket().Get<Entity>(fileKeys);
getBucket().Upsert(writeDictionary);

Using these methods did show a significant improvement, but it still wasn’t close to our pre-upgrade performance (I don’t think the sync method batch requests over the network which might contribute to the extra cost):
Post-Upgrade with Obsolete methods:
45% CPU on our .NET application server, and an average response time of 245ms for the test.

We added additional logging to time the batch set/gets, and ONLY changed the async methods (GetDocumentAsync/UpsertAsync) vs obsolete batch get/set methods (collection Get/Upsert), and recorded the values below:

Obsolete Methods:
45% CPU

  • 0.782ms per each document in a ‘bulk’ get from couchbase
  • 0.302ms per each document ‘bulk’ write to couchbase

Async based Methods:
65% CPU

  • 1.2ms per each document in a ‘bulk’ get from couchbase
  • 0.71ms per each document ‘bulk’ write to couchbase

*Notes:

  • The times are based on (total time for batch / number of items in the batch)
  • The performance test was the exact same
  • The only difference was the change in sdk method usage

I’m looking to see if anyone has come across these performance issues when dealing with batch requests to Couchbase using the latest .NET clients (or in general), or if any settings can be tweaked to improve performance.

Ideally, we’d like to get close to our pre-upgrade performance. Any input would be greatly appreciated!

@xulane

You are using the async methods synchronously. Use them like this:

var result = await Task.WhenAll(fileKeys.Select(key =&gt; getBucket(iEncryptor).GetDocumentAsync&lt;Entity&gt;(key)));

And like this:

var result = await Task.WhenAll(
    writeDictionary
    .Select(pair => getBucket(iEncryptor).UpsertAsync<Entity>(pair.Key, pair.Value))
);

You should get the best performance using the SDK straight “out out of the box” without adding changes to your config with the exception of increasing the MaxSize (possibly - depends on your environment).

That works in the ideal case where a service has been built around async from the start, but this is a legacy system that has existing interfaces and contracts it must adhere to.

I realize it’s never the first choice to use async via sync, but I don’t see another option (other than a major rewrite to our application) given that the sync API doesn’t have a bulk get (other than the obsolete ones), and that API doesn’t batch send those requests over the network (from what I’ve read).

I also don’t see how making the system async is going to improve the performance degradation we saw to the CPU (35% (pre-upgrade) or 45%(using obselete bulk sync) to 65% for the async option), or the response time of the service (this would require useful work that could be completed while waiting for the results, which isn’t the case for these requests – the data is required for processing to begin).

Going async would make the system more scalable, but it doesn’t mean it would gain this performance back.

Hi @xulane

Executing an async call in a synchronous environment does have some performance penalty’s because it will block the current thread, where an await operator allows the thread to be reused within the thread pool. I do agree you should not be seeing a degradation of performance like you have indicated.

The bulk methods that you are using on the Async API are just utility methods within the API that distributes the operations over as many connections are available; it does not affect how the TCP packets are sent over the network and is comparable to doing them from your application code.

I can see you’re using Couchbase server 5.5 and .NET SDK 2.6.0; please ca you share the following:

  • Are you providing any custom SDK configuration?
  • Is the new cluster comparable in size / memory / enabled services?

Also, MeepMeep is a .NET SDK performance tool if you’d like to try that. It needs updating to the latest .NET SDK but should help give you some idea of performance availability.

The new cluster is a mirror image of the old cluster resources wise - same number of virtual CPUs, RAM, etc. The major differences is we updated to Windows Server 2016, and updated to the latest version of Couchbase.

As for enabled services in CouchBase 5.5, we’re only running the data service since we’re only doing the most basic document gets/sets against it.

Configuration:
<couchbase operationTracingEnabled=“false” operationTracingServerDurationEnabled=“false”
enableOperationTiming=“false” enableQueryTiming=“false” xdt:Transform=“Replace”>
<servers>
<add uri=“http://…-1…:8091” />
<add uri=“http://…-2…:8091” />
<add uri=“http://…-3…:8091” />
<add uri=“http://…-4…:8091” />
<add uri=“http://…-5…:8091” />
<add uri=“http://…-6…:8091” />
</servers>
<buckets>
<add name=“bucketName” operationLifespan=“120000”>
<connectionPool name=“custom” maxSize=“100” minSize=“10” connectTimeout=“20000” sendTimeout=“120000” />
</add>
</buckets>
</couchbase>

Note:
The trace/timing were disabled to deal with the issue mentioned here:

operationTracingEnabled=“false”
operationTracingServerDurationEnabled=“false”
enableOperationTiming=“false”
enableQueryTiming=“false”

I’ll try MeepMeep today, and post the results.

A maxSize of 100 is very high; the default is 2. From .NET SDK 2.5.6 we switched to multiplexing connections by default, which allow data to be send and received concurrently. These are more performant than using a traditional connection pool that sends/receives then returns to the pool. Also, all connections are built up during setup, which with 100 could affect start-up times.

Please can you try lowing this to say 5 and test again?

Otherwise the config looks fine.

PS We released 2.6.2 of the .NET SDK earlier today that should resolve the issue with operation timing.

Will do. We’ll run another set of tests with the max lowered to 5, and then run MeepMeep afterwards, and post the results.

Thanks!

I’ll look to update MeepMeep with the latest SDK version too.

I re-ran our performance/load test using MaxSize=5, and I didn’t see much of a difference using the sync bulk get/set wrapper methods. There was a slight difference in response time, but I also didn’t run the perf test for an extended period of time, so I’d just consider this ‘within error’.

Post-Upgrade with Obsolete methods:

  • 45% CPU (didn’t change)
  • Response Time: 256ms (vs 245ms for large pool)

@xulane -

Is this a web application or a console application? If its a non-web application it may make sense to use the older obsolete TAP based methods or at least roll your own extensions based on the concepts so that your forwards compatible esp since your app is synchronous from the start.

@MikeGoldsmith

I ran the latest version of MeepMeep with the following results. I ran over a variety of workload sizes for sync and not sync.

Notes:

  • Per operation between sync and async matches what I’m seeing.
    Async Avg operation time (ms): 2.85966974 vs Sync Avg operation time (ms): 0.921244880000001, but the operations/sec are much lower.*
    *At 100k workload, Sync has a CPU of 20%(ish), and Async has a CPU of 75%+ (Is the sync option doing threaded sync requests or truly sync?)

– 500 –

Running with options:
Nodes=“http://…-1…:8091 http://…-2…:8091 http://…-3…:8091 http://…-4…:8091 http://…-5…:8091 http://…-6…:8091”
Bucket="…"
BucketPassword="…"
NumOfClients=1
WorkloadSize=500
DocSamplePath=""
DocKeyPrefix=“mm”
DocKeySeed=0
DocKeyRange=1000
ClusterUsername="…"
ClusterPassword="…"
WarmupMs=1000
Verbose=False
MutationPercentage=0.33
WorkloadType=MutationPercentage
EnableOperationTiming=True
UseJson=False
UseSync=False

Preparing bucket:
Running workloads…

[Completed workload: Mix of Get and Set (0.33%) operations against JSON doc(s) with doc size: 156.]
[Workload size: 500]
[Total docsize: 0]
[Total operations: 500]
[Failed operations: 0]
[Total time (ms): 72.1182]
[Avg operations per/second: 6933]
[Avg operation time (ms): 1.61481]
[Min successful operation time (ms): 0.3947]
[Max successful operation time (ms): 13.7446]
[95th percentile operation time (ms): 4.595695]
[98th percentile operation time (ms): 5.50376]
[99th percentile operation time (ms): 6.162905]

Running with options:
Nodes=“http://…-1…:8091 http://…-2…:8091 http://…-3…:8091 http://…-4…:8091 http://…-5…:8091 http://…-6…:8091”
Bucket="…"
BucketPassword="…"
NumOfClients=1
WorkloadSize=500
DocSamplePath=""
DocKeyPrefix=“mm”
DocKeySeed=0
DocKeyRange=1000
ClusterUsername="…"
ClusterPassword="…"
WarmupMs=1000
Verbose=False
MutationPercentage=0.33
WorkloadType=MutationPercentage
EnableOperationTiming=True
UseJson=False
UseSync=True

Preparing bucket:
Running workloads…

[Completed workload: Mix of Get and Set (0.33%) operations against JSON doc(s) with doc size: 156.]
[Workload size: 500]
[Total docsize: 0]
[Total operations: 500]
[Failed operations: 0]
[Total time (ms): 453.7195]
[Avg operations per/second: 1102]
[Avg operation time (ms): 0.879687800000001]
[Min successful operation time (ms): 0.4331]
[Max successful operation time (ms): 18.7084]
[95th percentile operation time (ms): 1.8976]
[98th percentile operation time (ms): 3.58850399999999]
[99th percentile operation time (ms): 4.435995]

–5k–
Running with options:
Nodes=“http://…-1…:8091 http://…-2…:8091 http://…-3…:8091 http://…-4…:8091 http://…-5…:8091 http://…-6…:8091”
Bucket="…"
BucketPassword="…"
NumOfClients=1
WorkloadSize=5000
DocSamplePath=""
DocKeyPrefix=“mm”
DocKeySeed=0
DocKeyRange=1000
ClusterUsername="…"
ClusterPassword="…"
WarmupMs=100
Verbose=False
MutationPercentage=0.33
WorkloadType=MutationPercentage
EnableOperationTiming=True
UseJson=False
UseSync=False

Preparing bucket:
Running workloads…

[Completed workload: Mix of Get and Set (0.33%) operations against JSON doc(s) with doc size: 156.]
[Workload size: 5000]
[Total docsize: 0]
[Total operations: 5000]
[Failed operations: 0]
[Total time (ms): 715.7795]
[Avg operations per/second: 6985]
[Avg operation time (ms): 2.85966974]
[Min successful operation time (ms): 0.3763]
[Max successful operation time (ms): 28.6535]
[95th percentile operation time(ms): 11.178515]
[98th percentile operation time (ms): 13.708498]
[99th percentile operation time (ms): 16.487178]

Running with options:
Nodes=“http://…-1…:8091 http://…-2…:8091 http://…-3…:8091 http://…-4…:8091 http://…-5…:8091 http://…-6…:8091”
Bucket="…"
BucketPassword="…"
NumOfClients=1
WorkloadSize=5000
DocSamplePath=""
DocKeyPrefix=“mm”
DocKeySeed=0
DocKeyRange=1000
ClusterUsername="…"
ClusterPassword="…"
WarmupMs=100
Verbose=False
MutationPercentage=0.33
WorkloadType=MutationPercentage
EnableOperationTiming=True
UseJson=False
UseSync=True

Preparing bucket:
Running workloads…

[Completed workload: Mix of Get and Set (0.33%) operations against JSON doc(s) with doc size: 156.]
[Workload size: 5000]
[Total docsize: 0]
[Total operations: 5000]
[Failed operations: 0]
[Total time (ms): 4721.4469]
[Avg operations per/second: 1059]
[Avg operation time (ms): 0.921244880000001]
[Min successful operation time (ms): 0.4085]
[Max successful operation time (ms): 23.4117]
[95th percentile operation time (ms): 1.80428]
[98th percentile operation time (ms): 4.88375799999999]
[99th percentile operation time (ms): 6.57278700000001]

–20k–

Running with options:
Nodes=“http://…-1…:8091 http://…-2…:8091 http://…-3…:8091 http://…-4…:8091 http://…-5…:8091 http://…-6…:8091”
Bucket="…"
BucketPassword="…"
NumOfClients=1
WorkloadSize=20000
DocSamplePath=""
DocKeyPrefix=“mm”
DocKeySeed=0
DocKeyRange=1000
ClusterUsername="…"
ClusterPassword="…"
WarmupMs=100
Verbose=False
MutationPercentage=0.33
WorkloadType=MutationPercentage
EnableOperationTiming=True
UseJson=False
UseSync=False

Preparing bucket:
Running workloads…

[Completed workload: Mix of Get and Set (0.33%) operations against JSON doc(s) with doc size: 156.]
[Workload size: 20000]
[Total docsize: 0]
[Total operations: 20000]
[Failed operations: 0]
[Total time (ms): 2397.5838]
[Avg operations per/second: 8342]
[Avg operation time (ms): 2.219094035]
[Min successful operation time (ms): 0.3558]
[Max successful operation time (ms): 33.0746]
[95th percentile operation time (ms): 7.76223]
[98th percentile operation time (ms): 10.60813]
[99th percentile operation time (ms): 11.820664]

Running with options:
Nodes=“http://…-1…:8091 http://…-2…:8091 http://…-3…:8091 http://…-4…:8091 http://…-5…:8091 http://…-6…:8091”
Bucket="…"
BucketPassword="…"
NumOfClients=1
WorkloadSize=20000
DocSamplePath=""
DocKeyPrefix=“mm”
DocKeySeed=0
DocKeyRange=1000
ClusterUsername="…"
ClusterPassword="…"
WarmupMs=100
Verbose=False
MutationPercentage=0.33
WorkloadType=MutationPercentage
EnableOperationTiming=True
UseJson=False
UseSync=True

Preparing bucket:
Running workloads…

[Completed workload: Mix of Get and Set (0.33%) operations against JSON doc(s) with doc size: 156.]
[Workload size: 20000]
[Total docsize: 0]
[Total operations: 20000]
[Failed operations: 0]
[Total time (ms): 20234.7412]
[Avg operations per/second: 988]
[Avg operation time (ms): 0.988603275000007]
[Min successful operation time (ms): 0.3916]
[Max successful operation time (ms): 26.6593]
[95th percentile operation time (ms): 2.3936]
[98th percentile operation time (ms): 5.985864]
[99th percentile operation time (ms): 7.25766099999999]

–100k–

Running with options:
Nodes=“http://…-1…:8091 http://…-2…:8091 http://…-3…:8091 http://…-4…:8091 http://…-5…:8091 http://…-6…:8091”
Bucket="…"
BucketPassword="…"
NumOfClients=1
WorkloadSize=100000
DocSamplePath=""
DocKeyPrefix=“mm”
DocKeySeed=0
DocKeyRange=1000
ClusterUsername="…"
ClusterPassword="…"
WarmupMs=100
Verbose=False
MutationPercentage=0.33
WorkloadType=MutationPercentage
EnableOperationTiming=True
UseJson=False
UseSync=False

Preparing bucket:
Running workloads…

[Completed workload: Mix of Get and Set (0.33%) operations against JSON doc(s) with doc size: 156.]
[Workload size: 100000]
[Total docsize: 0]
[Total operations: 100000]
[Failed operations: 0]
[Total time (ms): 11173.8347]
[Avg operations per/second: 8949]
[Avg operation time (ms): 2.85144512600003]
[Min successful operation time (ms): 0.2805]
[Max successful operation time (ms): 36.6904]
[95th percentile operation time (ms): 9.03734999999999]
[98th percentile operation time (ms): 11.352522]
[99th percentile operation time (ms): 12.777946]

Running with options:
Nodes=“http://…-1…:8091 http://…-2…:8091 http://…-3…:8091 http://…-4…:8091 http://…-5…:8091 http://…-6…:8091”
Bucket="…"
BucketPassword="…"
NumOfClients=1
WorkloadSize=100000
DocSamplePath=""
DocKeyPrefix=“mm”
DocKeySeed=0
DocKeyRange=1000
ClusterUsername="…"
ClusterPassword="…"
WarmupMs=100
Verbose=False
MutationPercentage=0.33
WorkloadType=MutationPercentage
EnableOperationTiming=True
UseJson=False
UseSync=True

Preparing bucket:
Running workloads…

[Completed workload: Mix of Get and Set (0.33%) operations against JSON doc(s) with doc size: 156.]
[Workload size: 100000]
[Total docsize: 0]
[Total operations: 100000]
[Failed operations: 0]
[Total time (ms): 94942.9585]
[Avg operations per/second: 1053]
[Avg operation time (ms): 0.927117849000002]
[Min successful operation time (ms): 0.1326]
[Max successful operation time (ms): 90.5195]
[95th percentile operation time (ms): 1.945]
[98th percentile operation time (ms): 5.14612000000001]
[99th percentile operation time (ms): 6.88121099999999]

Hello all,

@xulane, Do you find a solution for your performance issues with .net SDK ?
We have the same issues with .Net SDK in my Company. We use 2.7.6 SDK version.

Best regards,

Sorry, I lost the credentials to that account, and can’t recover them.

We ended up re-writing the way we use Couchbase to use a larger single document (JSON) instead of many smaller items. For small bulk requests for data, we used the couchbaseBucket.LookupIn<dynamic>(docKey); (less than ~16 items) to pull pieces from within the JSON, and for larger requests we pulled the entire document. It was more complicated to implement, but did work for our use case.

We could never get the performance we needed doing bulk gets/sets of many small items compared to the older SDKs.
Note: We aren’t using the new 3.0 SDK, so I can’t speak to that.