[NCBC-257] During rebalance client tries to connect the primary node only Created: 02/May/13 Updated: 15/May/13 |
|
| Status: | Open |
| Project: | Couchbase .NET client library |
| Component/s: | library |
| Affects Version/s: | 1.2.6 |
| Fix Version/s: | 1.2.7 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Saakshi Manocha | Assignee: | Saakshi Manocha |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
I'm adding this bug to identify performance issue that is raised in CBSE-521 and CBSE-528
It is observed during the sdkd scenario tests, that while rebalance is happening, the client tries to connect only the primary node and does not connect to the other secondary nodes in the cluster. During rebalance the topology changes and hence many errors like socket reset, no response received, operation time out, etc. These errors go away when the rebalance is over and with rebound phase, no errors are observed. Please see some sample reports: http://sdk-testresults.couchbase.com.s3.amazonaws.com/sdkd/HWIN-335SPEPOCGT-IHYBRID_fo-ept-rb-Sdotnet-1.2-release-T2013-04-02-00.11.35-LV_MC_BASIC.txt http://sdk-testresults.couchbase.com.s3.amazonaws.com/sdkd/HWIN-335SPEPOCGT-IHYBRID_rb-2-in-Sdotnet-1.2-release-T2013-04-02-00.21.03-LV_HTTP_BASIC.txt http://sdk-testresults.couchbase.com.s3.amazonaws.com/sdkd/HWIN-335SPEPOCGT-IHYBRID_fo-ept-eject-Sdotnet-1.2-release-T2013-04-02-00.17.30-LV_HTTP_BASIC.txt Mark - need your input here too, do you think these errors during rebalance can impact performance or stability at customer site. |
| Comments |
| Comment by Saakshi Manocha [ 02/May/13 ] |
|
Also, as per the documentation and our understanding, we can expect errors during CHANGE phase and ideally they should go away in REBOUND phase. CHANGE: Here we see that errors start happening. This is because a cluster topology change started around this time. We can expect errors until the topology change is completed. In this case, the topology change was adding a single node to the cluster. REBOUND: Here we see the errors are stopping. This is because the topology change has been completed. Since we added an extra node to the cluster, the rate of operations has actually gone up from before. This is because there are more nodes to handle requests now. |
| Comment by Matt Ingenthron [ 10/May/13 ] |
| This appears to be a critical issue. Marking as blocker for 1.2.7 until we have a better understanding. |
| Comment by John Zablocki [ 15/May/13 ] |
| When you say "connect to the primary node only" are you referring to the streaming connection or all ops are going on the primary node? |
[NCBC-78] Enhance discussion of return codes/values Created: 26/Jun/12 Updated: 07/Dec/12 |
|
| Status: | Open |
| Project: | Couchbase .NET client library |
| Component/s: | docs |
| Affects Version/s: | 1.1.6 |
| Fix Version/s: | 1.3 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Perry Krug | Assignee: | John Zablocki |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
Given the comments on this page: http://www.couchbase.com/docs/couchbase-sdk-net-1.1/couchbase-sdk-net-retrieve-set.html
Perhaps there can be a section on the possible return codes/values, what they mean, when they might happen, and how to deal with them? |
| Comments |
| Comment by Mark Nunberg [ 22/Nov/12 ] |
|
This is a must-have for SDK testing. It is very difficult to determine failure types and severities without knowing what types of errors, exception classes, or such to expect.
At the very least, there should be standard return codes (in a well-defined location) for memcached error codes. |