[CCBC-91] timeouts seen after failover, rebalance and add back Created: 13/Aug/12 Updated: 13/Nov/12 Resolved: 18/Aug/12 |
|
| Status: | Closed |
| Project: | Couchbase C client library libcouchbase |
| Component/s: | None |
| Affects Version/s: | 1.0.4 |
| Fix Version/s: | 1.0.5 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Matt Ingenthron | Assignee: | Sergey Avseyev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
PHP 5.3.3 (cli) (built: Jun 27 2012 12:25:48)
libcouchbase1-1.0.4-1 CentOS release 5.8 (Final), x86_64 Couchbase Server 1.8.1 Enterprise |
||
| Attachments: |
|
| Description |
|
scenario:
1. Start PHP client in a loop setting and getting against a 2 node cluster 2. Click failover to kick a node out, click rebalance to make it unassociated 3. Walk through the setup wizard on that node, re-add it to the cluster 4. After adding, click rebalance Expected behavior: During rebalance in step 4, which is an add node scenario, no timeouts are expected. Observed behavior: During rebalance in step 4, I see timeouts from PHP, and they continue even after the rebalance has completed. |
| Comments |
| Comment by Matt Ingenthron [ 13/Aug/12 ] |
| A packet capture of this same issue, with the client on MacOS X and CentOS 5.8 servers with Couchbase Server 1.8.1 enterprise edition may be found at http://dl.dropbox.com/u/1537838/failover-maybe-issue |
| Comment by Sergey Avseyev [ 13/Aug/12 ] |
| http://review.couchbase.org/19563 |
| Comment by Matt Ingenthron [ 13/Aug/12 ] |
| Note from discussion, this is a possible fix, not sure. |
| Comment by Matt Ingenthron [ 13/Aug/12 ] |
|
Sergey and I reproduced the issue, and it's related to the series of steps outlined above. The underlying libcouchbase is not receiving the updated configuration for some reason, and thus is sending items to the wrong node, and then they're timing out.
Sergey will do more work on finding the specific cause. |
| Comment by Sergey Avseyev [ 14/Aug/12 ] |
|
The patch http://review.couchbase.org/19599 and aforementioned http://review.couchbase.org/19563 solves the issue.
To reproduce it for sure you should failover the node is currently used by client to listen config changes. (Usually it is the first successfull node from initial node list) |