[JCBC-70] Client fails to reconnect to server of non-default memcached bucket after failover and add back Created: 28/Jun/12 Updated: 30/Jan/13 Resolved: 30/Jan/13 |
|
| Status: | Resolved |
| Project: | Couchbase Java Client |
| Component/s: | library |
| Affects Version/s: | 1.0.3 |
| Fix Version/s: | 1.1.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Perry Krug | Assignee: | Michael Nitschinger |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | customer | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
In earlier tests with reconnecting to a node on failover we used default memcached bucket. But when we tested the same scenario with a non-default bucket, we noticed the client did not reconnect (due to a null pointer exception internally). I have attached the SDK logs for this scenario where we used "IndexByLniataData" memcached bucket. The problem presents when adding the node back after a failover.
11:34:43,411 DEBUG [Memcached IO over {MemcachedConnection to /10.14.5.119:11210}] [CouchbaseMemcachedConnection] Selecting with delay of 3038ms Exception in thread "Thread-3" java.lang.NullPointerException at net.spy.memcached.auth.AuthThread.buildOperation(AuthThread.java:117) at net.spy.memcached.auth.AuthThread.run(AuthThread.java:86) Logs/stack trace attached. |
| Comments |
| Comment by Matt Ingenthron [ 22/Aug/12 ] |
|
I've spent a bit of time analyzing this issue, and it's not clear what the cause is. It is correct though that this would cause the auth thread to die, and as such authentication to the node would never complete.
There is a safeguard already in that the continuous timeout threshold will kick in and then the connection will be rebuilt. I don't know if this issue comes up all of the time, but assuming it's a rare event we'd see 1000 operations timeout (by default) followed by the connection being rebuilt. We'd have to add some diagnostic information to the client and reliably reproduce this to identify the issue. I think the scenario is: 1) set up a cluster of say 3 nodes 2) configure a client, have it work with an authenticated memcached bucket on the cluster 3) faillover a node by clicking on "failover" in the console 4) add the node back by clicking on "add back" Is this correct? |
| Comment by Perry Krug [ 23/Aug/12 ] |
| That appears correct. The customer has been able to reliably reproduce this, but since so much time has passed I would be hesitant in going back to them if not necessary... |
| Comment by Matt Ingenthron [ 09/Jan/13 ] |
| There is an open changeset for this. Please determine if it is correct, needs to go in. |
| Comment by Michael Nitschinger [ 30/Jan/13 ] |
| Duplicate of Spy-111 |