[JCBC-114] Command Futures never receive results after rebalance-out (or other sorts of topology/network changes) Created: 17/Sep/12 Updated: 12/Jun/13 |
|
| Status: | Reopened |
| Project: | Couchbase Java Client |
| Component/s: | docs |
| Affects Version/s: | 1.0.3 |
| Fix Version/s: | 1.1.8 |
| Security Level: | Public |
| Type: | Bug | Priority: | Major |
| Reporter: | Mark Nunberg | Assignee: | Michael Nitschinger |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Comments |
| Comment by Mark Nunberg [ 03/Oct/12 ] |
| This is a real blocker, and seems to be related to a few vbuckets. This issue is preventing me from properly measuring command durations |
| Comment by Farshid Ghods [ 03/Oct/12 ] |
|
Matt/Rags,
This issue is a blocker for executing more integration tests on java sdk. are there workarounds to avoid this use case or a fix on the way ? Please assign this back to Mark if more information or logs needed for this issue |
| Comment by Matt Ingenthron [ 04/Oct/12 ] |
| Please have a look at this. |
| Comment by Mark Nunberg [ 05/Oct/12 ] |
|
Michael,
I would not try this test manually.. the use case in more detail is as follows: - Single CouchbaseClient object - 20 user threads. 10 setting and 10 getting the same sorts of kv - Operations are done asynchronously. They are submitted into a queue which is then checked periodically for isDone/isCancelled. - 4 node cluster. Nodes are removed, connections are broken The issue is those polling methods never returning true, unless they are retrieved synchronously (i.e. ft.get()).. which is actually an accidental detail |
| Comment by Matt Ingenthron [ 24/Oct/12 ] |
|
We looked at this pretty closely today. The issue here is that the client as designed relies on the get() from the caller to trigger the timeout. An operation will, somewhat correctly, never transition to isDone() or isCancelled() unless someone cares to use it. The scenario that was likely in play over the WAN here is that the request was in flight to the server while the config was in flight down to the client. It arrives at the server, but is never responded to. Since the get() is never called, it'll never time out and transition to the canceled state. We recommend you change the test code to use the queue more like a queue and just get() each one. Iterating through the queue is a bit funny in the first place, but if using the get() on the Future objects, you'll still have asynchronous behavior and much of the time the get() will be returning since the data is already there. |
| Comment by Matt Ingenthron [ 24/Oct/12 ] |
| This behavior should be better documented, both in the javadoc and in the API reference. |