[NCBC-257] During rebalance client tries to connect the primary node only Created: 02/May/13  Updated: 15/May/13

Status: Open
Project: Couchbase .NET client library
Component/s: library
Affects Version/s: 1.2.6
Fix Version/s: 1.2.7

Type: Bug Priority: Blocker
Reporter: Saakshi Manocha Assignee: Saakshi Manocha
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
I'm adding this bug to identify performance issue that is raised in CBSE-521 and CBSE-528

It is observed during the sdkd scenario tests, that while rebalance is happening, the client tries to connect only the primary node and does not connect to the other secondary nodes in the cluster. During rebalance the topology changes and hence many errors like socket reset, no response received, operation time out, etc.
These errors go away when the rebalance is over and with rebound phase, no errors are observed.
Please see some sample reports:
http://sdk-testresults.couchbase.com.s3.amazonaws.com/sdkd/HWIN-335SPEPOCGT-IHYBRID_fo-ept-rb-Sdotnet-1.2-release-T2013-04-02-00.11.35-LV_MC_BASIC.txt
http://sdk-testresults.couchbase.com.s3.amazonaws.com/sdkd/HWIN-335SPEPOCGT-IHYBRID_rb-2-in-Sdotnet-1.2-release-T2013-04-02-00.21.03-LV_HTTP_BASIC.txt
http://sdk-testresults.couchbase.com.s3.amazonaws.com/sdkd/HWIN-335SPEPOCGT-IHYBRID_fo-ept-eject-Sdotnet-1.2-release-T2013-04-02-00.17.30-LV_HTTP_BASIC.txt

Mark - need your input here too, do you think these errors during rebalance can impact performance or stability at customer site.


 Comments   
Comment by Saakshi Manocha [ 02/May/13 ]
Also, as per the documentation and our understanding, we can expect errors during CHANGE phase and ideally they should go away in REBOUND phase.

CHANGE: Here we see that errors start happening. This is because a cluster topology change started around this time. We can expect errors until the topology change is completed. In this case, the topology change was adding a single node to the cluster.
REBOUND: Here we see the errors are stopping. This is because the topology change has been completed. Since we added an extra node to the
cluster, the rate of operations has actually gone up from before. This is because there are more nodes to handle requests now.
Comment by Matt Ingenthron [ 10/May/13 ]
This appears to be a critical issue. Marking as blocker for 1.2.7 until we have a better understanding.
Comment by John Zablocki [ 15/May/13 ]
When you say "connect to the primary node only" are you referring to the streaming connection or all ops are going on the primary node?




[NCBC-78] Enhance discussion of return codes/values Created: 26/Jun/12  Updated: 07/Dec/12

Status: Open
Project: Couchbase .NET client library
Component/s: docs
Affects Version/s: 1.1.6
Fix Version/s: 1.3

Type: Bug Priority: Blocker
Reporter: Perry Krug Assignee: John Zablocki
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
Given the comments on this page: http://www.couchbase.com/docs/couchbase-sdk-net-1.1/couchbase-sdk-net-retrieve-set.html

Perhaps there can be a section on the possible return codes/values, what they mean, when they might happen, and how to deal with them?

 Comments   
Comment by Mark Nunberg [ 22/Nov/12 ]
This is a must-have for SDK testing. It is very difficult to determine failure types and severities without knowing what types of errors, exception classes, or such to expect.

At the very least, there should be standard return codes (in a well-defined location) for memcached error codes.




Generated at Sun May 19 06:43:12 CDT 2013 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.