Details
Description
In earlier tests with reconnecting to a node on failover we used default memcached bucket. But when we tested the same scenario with a non-default bucket, we noticed the client did not reconnect (due to a null pointer exception internally). I have attached the SDK logs for this scenario where we used "IndexByLniataData" memcached bucket. The problem presents when adding the node back after a failover.
11:34:43,411 DEBUG [Memcached IO over {MemcachedConnection to /10.14.5.119:11210}] [CouchbaseMemcachedConnection] Selecting with delay of 3038ms
Exception in thread "Thread-3" java.lang.NullPointerException
at net.spy.memcached.auth.AuthThread.buildOperation(AuthThread.java:117)
at net.spy.memcached.auth.AuthThread.run(AuthThread.java:86)
Logs/stack trace attached.
11:34:43,411 DEBUG [Memcached IO over {MemcachedConnection to /10.14.5.119:11210}] [CouchbaseMemcachedConnection] Selecting with delay of 3038ms
Exception in thread "Thread-3" java.lang.NullPointerException
at net.spy.memcached.auth.AuthThread.buildOperation(AuthThread.java:117)
at net.spy.memcached.auth.AuthThread.run(AuthThread.java:86)
Logs/stack trace attached.
Issue Links
Activity
Raghavan Srinivas
made changes -
| Field | Original Value | New Value |
|---|---|---|
| Status | Open [ 1 ] | Resolved [ 5 ] |
| Fix Version/s | 1.1beta [ 10370 ] | |
| Resolution | Incomplete [ 4 ] |
Raghavan Srinivas
made changes -
| Resolution | Incomplete [ 4 ] | |
| Status | Resolved [ 5 ] | Reopened [ 4 ] |
Matt Ingenthron
made changes -
Matt Ingenthron
made changes -
Michael Nitschinger
made changes -
| Fix Version/s | 1.0.4 [ 10364 ] |
Raghavan Srinivas
made changes -
| Status | Reopened [ 4 ] | In Progress [ 3 ] |
Matt Ingenthron
made changes -
| Assignee | Raghavan Srinivas [ rags ] | Matt Ingenthron [ ingenthr ] |
Michael Nitschinger
made changes -
| Priority | Major [ 3 ] | Blocker [ 1 ] |
Matt Ingenthron
made changes -
Matt Ingenthron
made changes -
| Status | In Progress [ 3 ] | Resolved [ 5 ] |
| Resolution | Fixed [ 1 ] |
Matt Ingenthron
made changes -
| Resolution | Fixed [ 1 ] | |
| Status | Resolved [ 5 ] | Reopened [ 4 ] |
| Assignee | Matt Ingenthron [ ingenthr ] | Michael Nitschinger [ daschl ] |
Michael Nitschinger
made changes -
| Fix Version/s | 1.1.1 [ 10430 ] | |
| Fix Version/s | 1.0.4 [ 10364 ] | |
| Fix Version/s | 1.1-beta [ 10370 ] |
Michael Nitschinger
made changes -
| Fix Version/s | 1.1.2 [ 10480 ] | |
| Fix Version/s | 1.1.1 [ 10430 ] |
Michael Nitschinger
made changes -
| Status | Reopened [ 4 ] | Resolved [ 5 ] |
| Resolution | Duplicate [ 3 ] |
Michael Nitschinger
made changes -
There is a safeguard already in that the continuous timeout threshold will kick in and then the connection will be rebuilt. I don't know if this issue comes up all of the time, but assuming it's a rare event we'd see 1000 operations timeout (by default) followed by the connection being rebuilt.
We'd have to add some diagnostic information to the client and reliably reproduce this to identify the issue. I think the scenario is:
1) set up a cluster of say 3 nodes
2) configure a client, have it work with an authenticated memcached bucket on the cluster
3) faillover a node by clicking on "failover" in the console
4) add the node back by clicking on "add back"
Is this correct?