Details
Description
When investigating an issue on the java client library with retrying operations based on not-my-vbucket responses, I've noticed that at the end of a rebalance removing a server, the server being removed will drop the connection while operations are in flight.
There would be a period of time when the bucket transitions from active to dead, after the takeover, when it would only respond with not-my-vbucket replies.
Unfortunately, the current behavior makes application code, at best, need to handle more complex failure logic. At worst, unhandled by the application it could lead to data loss.
The challenge here is determining the period of time. Some clients do not disconnect, and there is no server polite hangup.
The attached log demonstrates the issue, and the attached test program will let one observe it. This test was carried out by:
1) Set up 3 node cluster with a default bucket which is of the Couchbase type
2) Start the test program, first argument is number of seconds to run, arguments after that are hostname/ips for the nodes in the cluster
3) Remove a node from the cluster
Expected behavior: All operations sent to the server receive a not-my-vbucket reply and are rescheduled as we receive config updates from the server.
Observed behavior: At the end of the remove server/rebalance cycle, the connection is dropped and in-flight operations will be canceled by the client, since it doesn't really know the status of those operations.
There would be a period of time when the bucket transitions from active to dead, after the takeover, when it would only respond with not-my-vbucket replies.
Unfortunately, the current behavior makes application code, at best, need to handle more complex failure logic. At worst, unhandled by the application it could lead to data loss.
The challenge here is determining the period of time. Some clients do not disconnect, and there is no server polite hangup.
The attached log demonstrates the issue, and the attached test program will let one observe it. This test was carried out by:
1) Set up 3 node cluster with a default bucket which is of the Couchbase type
2) Start the test program, first argument is number of seconds to run, arguments after that are hostname/ips for the nodes in the cluster
3) Remove a node from the cluster
Expected behavior: All operations sent to the server receive a not-my-vbucket reply and are rescheduled as we receive config updates from the server.
Observed behavior: At the end of the remove server/rebalance cycle, the connection is dropped and in-flight operations will be canceled by the client, since it doesn't really know the status of those operations.
Activity
Matt Ingenthron
made changes -
| Field | Original Value | New Value |
|---|---|---|
| Attachment | MB-5406-droppedatend.log [ 13397 ] | |
| Attachment | MB-5406-testprogram.tar.gz [ 13398 ] |
Farshid Ghods
made changes -
| Fix Version/s | 2.0-developer-preview-5 [ 10290 ] |
Aleksey Kondratenko
made changes -
| Assignee | Dipti Borkar [ dipti ] | Trond Norbye [ trond ] |
Aleksey Kondratenko
made changes -
| Assignee | Trond Norbye [ trond ] | Dipti Borkar [ dipti ] |
| Fix Version/s | 2.0-developer-preview-5 [ 10290 ] |
Farshid Ghods
made changes -
| Fix Version/s | 2.0-developer-preview-5 [ 10290 ] |
Farshid Ghods
made changes -
| Assignee | Dipti Borkar [ dipti ] | Aleksey Kondratenko [ alkondratenko ] |
Aleksey Kondratenko
made changes -
| Assignee | Aleksey Kondratenko [ alkondratenko ] | Dipti Borkar [ dipti ] |
Farshid Ghods
made changes -
| Fix Version/s | 2.0-beta [ 10113 ] | |
| Fix Version/s | 2.0-developer-preview-5 [ 10290 ] |
Dipti Borkar
made changes -
| Assignee | Dipti Borkar [ dipti ] | Aleksey Kondratenko [ alkondratenko ] |
Peter Wansch
made changes -
| Fix Version/s | 2.0 [ 10114 ] | |
| Fix Version/s | 2.0-beta [ 10113 ] |
Aleksey Kondratenko
made changes -
| Status | Open [ 1 ] | Resolved [ 5 ] |
| Resolution | Fixed [ 1 ] |
Farshid Ghods
made changes -
| Assignee | Aleksey Kondratenko [ alkondratenko ] | Andrei Baranouski [ andreibaranouski ] |
Andrei Baranouski
made changes -
| Status | Resolved [ 5 ] | Closed [ 6 ] |