Sucess but empty!

david.allaigre · February 22, 2017, 7:25pm

Hi,

We are using the couchbase version 4.5.1 and .Net Couchbase client 2.4.0. When we are stressing the system by doing a lot of GetAsync in Parallel, to figure the most optimal way to make a bulk get. We have a lot of response coming back with Success status and no value.
By debugging the client we come to see that we are always receiving a array of byte of 24 bytes in those cases, fully empty except for the opaque uint. Because the status Success is the value 0, the response show a success !.

So do you have any idea where can be the issue ? And if it will be possible to have a more accurate status.

Best regards,

David.

jmorris · February 22, 2017, 7:34pm

@david.allaigre -

It sounds like a bug, can you create a Jira ticket? Please include an example of the code you are using so we can try to replicate, as well as OS and whether or not your project is .NET Full or .NET Core.

Also, the best performance for GetAsync in bulk is to create a list of GetAsync tasks and then use await Task.WhenAll(tasks);

-Jeff

david.allaigre · February 22, 2017, 7:58pm

Hi Jeff,

Thanks a lto for your answer, i don’t know how to be able to post something in Jira. For the code this is exactly what we are doing : await Task.WhenAll(keys.select(k=>bucket.GetAsync(k));
Connection Pool is MaxSize=100, We are requesting 70000 keys.

For now we have two can of errors:

the initializing error
the success with no values.

For now we are just putting that code in a Try Retry loop for all the keys not successfully returned. Usually we need 5 attemps to get them all in 6s…

Regards,

David.

jmorris · February 22, 2017, 8:20pm

@david.allaigre -

I found the problem, its a bug in how the client is handling a timeout when the remote is not responsive. In my case, my cluster is a running on a VM and the load is simply to much for it for the resources allocated to it - one of the nodes became un-responsive. The client timing out non-responsive connections isn’t a bug, but how it handles the timeout is wrong in that its over-writing the failure with a success response.

I should have a patch soon so that the correct error message is returned when a timeout occurs.

-Jeff

david.allaigre · February 22, 2017, 9:18pm

Thanks a lot Jeff,

I m really looking forward to the patch.

Best regards,

David.

jmorris · February 22, 2017, 11:00pm

@david.allaigre -

Can you let me know the OS and whether you are using .NET Core or Full framework? Ticket: Loading...

Also, the patch won’t change the fact that the timeout is occurring, it will simply return back the correct response when an operation times out. You’ll want to look into what’s going on the server or network.

-Jeff

david.allaigre · February 23, 2017, 12:02am

Of course, i m not expecting you to solve the base issue we are facing ! We have a high cpu / high network latency in our cluster farms. We are investigating that issue with Couchbase support team and our network team.

We are mainly using Window Server 2012 R2 on VMWare technologies with .Net 4.6.2 for server and client.

Having the timeout exception will already helping us a lot, at least for implementing a retry mechanism.

David.

david.allaigre · February 23, 2017, 4:17pm

Another point,

In the current code of the client at the operation level the ShoudRetry method is not taking care of the NodeUnaviableException -> Initializing. I was wondering if that case must be “Retry” ?

David.

jmorris · February 23, 2017, 6:07pm

@david.allaigre -

A NodeUnavailableException indicates that the client cannot connect or maintain a connection to a cluster node. This would happen if the node goes offline or the service crashes and cannot restart. In this case, the client should not retry because usually this ends up burning CPU while the client does it’s retry loop. The application using the client, could decide that in this case it wants to retry and then go ahead and use its own retry logic. It’s really up to the applications requirements.

In this case (NodeUnavailableException=>Initializing), I don’t think the logic is correct here…this should really be a retry state unless the operation lifespan has been exceeded or its a write w/out a CAS. I created a ticket for following the progress: Loading.... From your application’s perspective its fine to retry for this case.

-Jeff

Topic		Replies	Views
High concurrent access results in timeouts and ClientFailure? .NET SDK	11	3900	January 26, 2018
Nuget Version 2.7.10 : Getting Timeout after 15 seconds specific to one bucket .NET SDK timeout , dot-net	15	2669	December 12, 2020
.NET SDK Performance .NET SDK	12	2994	November 3, 2020
GAT Operation Timeout KVTimeOut is 2.5s .NET SDK	22	1084	April 15, 2024
Couchbase exception returns incorrect status .NET SDK	5	1075	September 13, 2022

Sucess but empty!

Related topics