Multiple Get Operation

gok.gokalp · July 21, 2017, 7:22am

Hello.
Currently, we are using the 2.4.7 version of .NET SDK.
When we try to get multiple items from the Bucket (eg. 1200-1500 items), we are always facing high response time. Also, we tried the same issue with the java SDK, everything’s ok.

What’s the best practice to get multiple items? (Also we tried a lot of different pool&bucket configuration etc.)
When we try to use the view instead of SDK’s the multiple get function, this time we faced high CPU rate on the Couchbase server.
Also, we don’t want to use SDK’s the multiple get function, because it is using Parallel.ForEach and this is unacceptable. How can we perform with a single query? (Maybe N1LQL, but I’m not sure)

MikeGoldsmith · July 21, 2017, 9:14am

Hi @gok.gokalp

Please can you answer the following questions to help me better understand the problem you’re having:

Please can you share your code that you are using that is resulting in high latency
Please can you also share the client configurations you’ve tried
Are you working in a sync or async scenario
How do you work out the keys you want to retrieve

In regards to not using the SDKs MultiGet functionality; Why is the Parallel.ForEach unacceptable? We have found that dispatching parallel requests across many TCP connections is more efficient and performant than a single batch get. We also utilise multiplexing connections from SDK version 2.4+ which allow data to be pipelined in both directions concurrently for further efficiency gains.

I do agree a view is probably not the right way to retrieve documents as it will introduce latency between write and read plus as you have seen is more CPU intensive. N1QL may be an option, but if you can determine the keys without a query, direct KV operations are faster.

Thanks

gok.gokalp · July 21, 2017, 11:26am

Hello Mike, first thanks for your interest.

- Please can you share your code that you are using that is resulting in high latency

Our code block like as shown below:

    public IEnumerable<VariantStockInfo> GetVariantInfoList(IEnumerable<string> ids)
    {
           IDictionary<string, IOperationResult<StockInfo>> operationResults = null;

            operationResults = _bucket.Get<StockInfo>(ids.ToList(), new ParallelOptions
            {
                MaxDegreeOfParallelism = 2 //can change
            });

            //...
        }
    }

- Please can you also share the client configurations you’ve tried

This one our current configuration the best for us:

      ClientConfiguration clientConfiguration = new ClientConfiguration
        {
            Servers = GetCouchBaseUris(),
            ConnectionPoolCreator = ConnectionPoolFactory.GetFactory<ConnectionPool<MultiplexingConnection>>(),
            IOServiceCreator = IOServiceFactory.GetFactory<MultiplexingIOService>()
        };

Also, we tried this based on some articles:

            ClientConfiguration clientConfiguration = new ClientConfiguration
        {
            Servers = GetCouchBaseUris(),
            Serializer = () => new CustomJilSerializer(),
            EnableConfigHeartBeat = true,
            PoolConfiguration = new PoolConfiguration
            {
                MaxSize = 35,
                MinSize = 5
            },
            BucketConfigs = new Dictionary<string, BucketConfiguration>
            {
                {variantStockBucket,new BucketConfiguration
                {
                    PoolConfiguration = new PoolConfiguration
                    {
                        MinSize = 35,
                        MaxSize = 5
                  
                    },
                    BucketName = variantStockBucket
                   
                } }
            }

        };

Are you working in a sync or async scenario

We tried both of them with the “GetDocumentsAsync<>” and “Get<>” methods. The “GetDocumentsAsync<>” method looks like more efficient instead of “Get<>” method for the multiple get operation.

- How do you work out the keys you want to retrieve

I don’t understand this question.

Why is the Parallel.ForEach unacceptable?

Yes, you right Parallel is the good option to most use cases. But, it depends based on the environments. It seems like multiplexer can’t manage the TCP connections, because it creates a lot of connections much more then the key count i use, parallel calls may be more feasible with 100-200 keys but when I need work with more keys 1000-5000, we begin facing connections problems.

Regards

MikeGoldsmith · July 21, 2017, 12:20pm

Ok, thanks for the additional information.

As part of the 2.4.6 release we introduced the SharedConnectionPool (used by default) that maintains a pool of multiplexing connections. The ConnectionPoolCreator you have in your current configuration is the old type that only had one multiplexing connection per server/bucket combination.

Please can you use this instead:

ConnectionPoolCreator = ConnectionPoolFactory.GetFactory<SharedConnectionPool<MultiplexingConnection>>()

or just remove it from your configuration as that is the same as the default type. The IOServiceCreator you have is also using the default so could be omitted from the config.

I also have this Gist that shows how to do an async version of multi get.

How do you work out the keys you want to retrieve
What I mean, is how do you work out the keys you are going to bulk retrieve?

Thanks

btburnett3 · July 21, 2017, 12:38pm

@gok.gokalp

I personally agree that Parallel.ForEach may not be the most efficient approach. It’s treating the Get process as processor constrained, running only as many get operations at a time as you have cores. I think it’s probably more network constrained, waiting on responses from the server, so more operations running simultaneously could increase throughput. But that would require some experimentation.

However, I would recommend removing the MaxDegreeOfParallelism setting. That’s probably reducing the number of simultaneous get operations you’re running, as you probably have more than 2 cores. If you leave ParallelOptions off I believe Parallel.ForEach will use the number of system cores automatically.

jmorris · July 21, 2017, 5:04pm

@gok.gokalp -

You’ll likely get better results doing what @MikeGoldsmith suggested and removing:

ConnectionPoolCreator = ConnectionPoolFactory.GetFactory<ConnectionPool<MultiplexingConnection>>(),
IOServiceCreator = IOServiceFactory.GetFactory<MultiplexingIOService>()

From your configuration (since your using 2.4.7 and their is a new SharedConnectionPool configured by default).

Also, for multiget - the best option is to use the XXXAsync operations and Task.WhenAll, something like this pseudo code:

var keys = new List<string>{"Key1", "Key2", "Key3"};
var tasks = new List<IOperationResult<string>>();
foreach(var key in keys)
{
      tasks.Add(bucket.GetAsync(key));
}
await Task.WhenAll(keys);

gok.gokalp · July 24, 2017, 5:34am

Thanks all.

I’ll try your recommendation today.

@jmorris I think this multi get code is the same with SDK’s GetDocumentsAsync method, and currently I’m using that and I get the much better result instead of other methods.

@MikeGoldsmith btw, I looked your code snippets, and the last one is maybe causing a race condition problem.I’ll try and write the results.

jmorris · July 24, 2017, 7:51pm

@gok.gokalp -

Yes, its the same approach.

Topic		Replies	Views
How to do .Get<T>(List keys) in SDK 3.x in C#? .NET SDK	6	1782	November 12, 2020
Bulk get per node .NET SDK get	2	1548	April 29, 2019
How to perform REAL multi Get .NET SDK	33	9428	January 24, 2018
Retrieving multiple documents for which I know the key in one call, Java SDK Java SDK	1	1205	September 6, 2018
Sdk net 3.0.1 vs Sdk net 2.7 Performance Difference .NET SDK	10	1317	June 10, 2020

Multiple Get Operation

Related topics