We are using the couchbase version 4.5.1 and .Net Couchbase client 2.4.0. When we are stressing the system by doing a lot of GetAsync in Parallel, to figure the most optimal way to make a bulk get. We have a lot of response coming back with Success status and no value.
By debugging the client we come to see that we are always receiving a array of byte of 24 bytes in those cases, fully empty except for the opaque uint. Because the status Success is the value 0, the response show a success !.
So do you have any idea where can be the issue ? And if it will be possible to have a more accurate status.
It sounds like a bug, can you create a Jira ticket? Please include an example of the code you are using so we can try to replicate, as well as OS and whether or not your project is .NET Full or .NET Core.
Also, the best performance for GetAsync in bulk is to create a list of GetAsync tasks and then use await Task.WhenAll(tasks);
Thanks a lto for your answer, i don’t know how to be able to post something in Jira. For the code this is exactly what we are doing : await Task.WhenAll(keys.select(k=>bucket.GetAsync(k));
Connection Pool is MaxSize=100, We are requesting 70000 keys.
For now we have two can of errors:
the initializing error
the success with no values.
For now we are just putting that code in a Try Retry loop for all the keys not successfully returned. Usually we need 5 attemps to get them all in 6s…
I found the problem, its a bug in how the client is handling a timeout when the remote is not responsive. In my case, my cluster is a running on a VM and the load is simply to much for it for the resources allocated to it - one of the nodes became un-responsive. The client timing out non-responsive connections isn’t a bug, but how it handles the timeout is wrong in that its over-writing the failure with a success response.
I should have a patch soon so that the correct error message is returned when a timeout occurs.
Also, the patch won’t change the fact that the timeout is occurring, it will simply return back the correct response when an operation times out. You’ll want to look into what’s going on the server or network.
Of course, i m not expecting you to solve the base issue we are facing ! We have a high cpu / high network latency in our cluster farms. We are investigating that issue with Couchbase support team and our network team.
We are mainly using Window Server 2012 R2 on VMWare technologies with .Net 4.6.2 for server and client.
Having the timeout exception will already helping us a lot, at least for implementing a retry mechanism.
A NodeUnavailableException indicates that the client cannot connect or maintain a connection to a cluster node. This would happen if the node goes offline or the service crashes and cannot restart. In this case, the client should not retry because usually this ends up burning CPU while the client does it’s retry loop. The application using the client, could decide that in this case it wants to retry and then go ahead and use its own retry logic. It’s really up to the applications requirements.
In this case (NodeUnavailableException=>Initializing), I don’t think the logic is correct here…this should really be a retry state unless the operation lifespan has been exceeded or its a write w/out a CAS. I created a ticket for following the progress: https://issues.couchbase.com/browse/NCBC-1339. From your application’s perspective its fine to retry for this case.