Performance degradation after upgrading to v3.3.6

Hi, everyone
We have recently upgraded a mission-crition application to v3.3.6 SDK version and we noticed a considerable performance loss, around -50%.
We have done some profiling and we noticed that the SDK is throwing some exception for the case of DocumentNotExist server response and we are worried if this is causing the performance loss we’ve detected.
Here is the Benchmark we did. In the benchmark report you will notice a “TryGetAsync” which is a test I’ve being doing refactoring some points of the SDK, avoiding the exception throwing, for comparison.

Is it possible to provide the TryGetAsync method? Something like:
Task<(bool, IGetResult)> TryGetAsync(string id, GetOptions? options = null);
Additional reading: Exceptions and Performance - Framework Design Guidelines | Microsoft Learn

        [Benchmark(Baseline = true)]
        
        public void GetAsync()
        {
            var tasks = new Task[GetPerOperation];

            for (var i = 0; i < GetPerOperation; i++)
                tasks[i] = Task.Run(async () => {
                    try
                    {

                        await _couchbaseCollection.GetAsync("my-document-key");
                    }
                    catch (Exception e) 
                    { 
                        
                    }

                });

            Task.WaitAll(tasks);
        }


        [Benchmark]

        public void TryGetAsync()
        {

            var tasks = new Task[GetPerOperation];

            for (var i = 0; i < GetPerOperation; i++)
                tasks[i] = Task.Run(async () => {
                    try
                    {

                        await _couchbaseCollection.TryGetAsync("my-document-key");
                    }
                    catch (Exception e)
                    {

                    }

                });

            Task.WaitAll(tasks);
        }

Profiling with Jetbrains DotTrace (Ocurrencies of DocumentNotFoundException in only 1 request):


Seems that with TryGetAsync approach we recover some performance, mainly in more concurrency scenarios.

A few questions:

  1. What version of the SDK were you using previous to 3.3.6? (You said 3.6.6, so I’m assuming you meant 3.3.6)
  2. Can you provide the code for TryGetAsync?

Also, note that spooling large numbers of simultaneous operations is a known performance bottleneck in some cases, as it floods the connections and thread pool and can result in timeouts. We generally recommend limiting the degree of parallelism and you’ll get better throughput. We have started experimenting with an in-flight operation limit to reduce this somewhat, but nothing is complete yet.

Hi, @btburnett3
Sorry, you are correct, it is the version 3.3.6. The previous SDK was v2.7.11
I’ve opened a PR with the proposed solution, for discussion purposes => TryGetAsync proposed solution by kaiohenrique · Pull Request #120 · couchbase/couchbase-net-client · GitHub
But in the Benchmark, notice the exception numbers, it is alarming. Also, in this forum I saw some SDK users also worried about this exception-based implementation. Shouldn’t we reconsider this design?

But please, notice that with TryGetAsync approach, we got very good numbers. 3x faster for 200 async-get per Benchmark-iteration case is very promising. What do you think?

@kaiohenrique -

Hi, thanks for posting! Reading through the thread and a quick review of the PR (thanks!) that you pushed, I pretty sure what your seeing is a side effect in the changes that were made between sdk2 and sdk3 API.

There is some agreement about it not being the most performant way of handling errors; it doesn’t make sense in certain cases such as KeyNotFound where it’s not an exception, but possibly an expected state. The good news is that we have already refactored the internals to support something like what you did in your PR. These changes were introduced in 3.4.2 (see NCBC-2167) and do improve performance overall, but lack the support for handling the KeyNotFound case.

The idea was not to widen the API (ICouchbaseCollection in .NET), but to provide extension methods that allowed you to handle certain responses by bubbling up the status and not the exception or perhaps providing a binary response like your PR. The design here though is still in flux.

Hi, @jmorris
It is not clear for me how can I extend CouchbaseCollection. Can you please provide an example?
Also, is necessary to extend RetryOrchestrator as it is still throwing exception (when non-retryable)? [reference]

@kaiohenrique

I don’t mean providing a new implementation of CouchbaseCollection, I mean providing extension methods that provide an alternative interface which is non-breaking for existing clients. To do this, internal changes would need to be made so that only when these methods were called the behavior would change.

Yes, some exceptions are still thrown internally, especially if they are not retriable. Changing this would be a breaking change for existing consumers and isn’t an option.

@jmorris
Can we have a hotfix for that? Perhaps a new interface with a method similar to TryGetAsync?
Would be terrible for us to downgrade to 2.7 and we are in a hurry to going to production.

I have benchmark 3.4.2 and 3.3.6 against 2.7.11 (Couchbase Server 6.6)


I still think that the nature of the benchmark, running so much in parallel, isn’t a real-world scenario and is skewing the results. Writing benchmarks well is hard.

That said, I do generally agree that DocumentNotFoundException could be improved upon, especially as regards performance. I know it was written that way to be consistent with the SDKs in other languages, but it isn’t a great fit in .NET in my opinion.

The problem, as mentioned by Jeff, is backward compatibility. If we do anything, it must not be a breaking change. However, an extension method is tricky. The simple implementation of an extension method would be to catch the exception and handle it more gracefully, but that won’t help from a performance perspective.

I have two possible proposals, each with pros and cons.

Proposal #1: Add an option to GetOptions, something like ExceptionIfMissing defaulted to true. This would maintain current behavior by default. Then change IGetResult to include an Exists property which can be checked before GetContent in the case where you opt-out of the exception. Finally, change missing documents to only throw an exception later, in ICouchbaseCollection.GetAsync, if exceptions are expected based on GetOptions. We’d probably also throw the same exception in GetContent if that gets called.

The downside of this proposal is it’s a bit of a departure from the RFC for the SDKs, but I think it’s easy to use.

Option #2: In this case, the internals would be changed in exactly the way I mention in Option #1, however, the new properties I mention would be kept as internal so they don’t pollute the API surface. Then a public extension method TryGetAsync would be added which uses the new internals. I’d recommend a return type of ITryGetResult that inherits from IGetResult and adds the Exists property publicly.

The advantage of this approach is it doesn’t pollute the official API surface, just adds an extension, but it’s a bit more convoluted.

@jmorris I’m curious about your thoughts on this. If there is an option you like I’m happy to take on the work.

1 Like

Hi @kaiohenrique/@btburnett3

Thanks for posting; this doesn’t exactly follow the “hot fix” criteria, but would be a new feature. We are investigating a solution and will follow up.

1 Like

Hi @jmorris , do you have a chance to investigate issue with performance in Couchbase .NET SDK ? I see significant performance degradation after upgrading up to 3.4.5 version in big application that intensive use Couchbase.

@ytatarynov

With respect to the topic of the OP, yes, we added a TryGetAsync method and reworked the internals of the SDK so that exceptions are not thrown and re-caught. This was released in later SDK versions and improved performance significantly.

Your comment does not give enough specific information to answer. If you wish, I suggest you create a new post and explain in detail what performance issue you are running into.

-Jeff