[JCBC-114] Command Futures never receive results after rebalance-out (or other sorts of topology/network changes) Created: 17/Sep/12  Updated: 06/Sep/13

Status: Reopened
Project: Couchbase Java Client
Component/s: Documentation
Affects Version/s: 1.0.3
Fix Version/s: .future
Security Level: Public

Type: Bug Priority: Major
Reporter: Mark Nunberg Assignee: Michael Nitschinger
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Dependency
depends on JCBC-117 mention that OperationFuture.get(tmo)... Reopened

 Comments   
Comment by Mark Nunberg [ 03/Oct/12 ]
This is a real blocker, and seems to be related to a few vbuckets. This issue is preventing me from properly measuring command durations
Comment by Farshid Ghods (Inactive) [ 03/Oct/12 ]
Matt/Rags,

This issue is a blocker for executing more integration tests on java sdk. are there workarounds to avoid this use case or a fix on the way ?
Please assign this back to Mark if more information or logs needed for this issue
Comment by Matt Ingenthron [ 04/Oct/12 ]
Please have a look at this.
Comment by Mark Nunberg [ 05/Oct/12 ]
Michael,

I would not try this test manually.. the use case in more detail is as follows:

- Single CouchbaseClient object
- 20 user threads. 10 setting and 10 getting the same sorts of kv
- Operations are done asynchronously. They are submitted into a queue which is then checked periodically for isDone/isCancelled.
- 4 node cluster. Nodes are removed, connections are broken

The issue is those polling methods never returning true, unless they are retrieved synchronously (i.e. ft.get()).. which is actually an accidental detail
Comment by Matt Ingenthron [ 24/Oct/12 ]
We looked at this pretty closely today. The issue here is that the client as designed relies on the get() from the caller to trigger the timeout. An operation will, somewhat correctly, never transition to isDone() or isCancelled() unless someone cares to use it.

The scenario that was likely in play over the WAN here is that the request was in flight to the server while the config was in flight down to the client. It arrives at the server, but is never responded to. Since the get() is never called, it'll never time out and transition to the canceled state.

We recommend you change the test code to use the queue more like a queue and just get() each one. Iterating through the queue is a bit funny in the first place, but if using the get() on the Future objects, you'll still have asynchronous behavior and much of the time the get() will be returning since the data is already there.
Comment by Matt Ingenthron [ 24/Oct/12 ]
This behavior should be better documented, both in the javadoc and in the API reference.
Generated at Thu Aug 28 07:06:40 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.