Rebalance client failure

jacob_michaelsen · August 27, 2015, 1:57pm

Hi

I am experimenting with a 3 node cluster, to see what happens from a client perspective when failing over nodes and rebalancing a memcached bucket called “cache”.

For my test I created 1000 documents with a known key, I then added a new node and initiated a rebalance. As soon as this is triggered, my code fails to work and continues not to work, I simple get ClientFailure status with no exception when my code tries to read documents.

Looking at the logs it appears that I get some kind of auth failure:

2015-08-27 15:38:15,391 [Read thread #1] DEBUG Couchbase.IO.ConnectionBase - Complete opaque72 on 4b7e3a42-2a17-46b4-9e09-896e8249e9af
2015-08-27 15:38:15,391 [Read thread #1] DEBUG Couchbase.Authentication.SASL.CramMd5Mechanism - Authentication for socket 4b7e3a42-2a17-46b4-9e09-896e8249e9af failed: Auth failure
2015-08-27 15:38:15,391 [Read thread #1] DEBUG Couchbase.IO.Strategies.DefaultIOStrategy - Could not authenticate cache using Couchbase.Authentication.SASL.CramMd5Mechanism - 0e8155b4-08b0-4094-923e-ce03fb5a0aba.
2015-08-27 15:38:15,391 [Read thread #1] DEBUG Couchbase.IO.Strategies.DefaultIOStrategy - System.Security.Authentication.AuthenticationException: cache
at Couchbase.IO.Strategies.DefaultIOStrategy.Authenticate(IConnection connection)
at Couchbase.IO.Strategies.DefaultIOStrategy.Execute[T](IOperation`1 operation)

Log file (see line 1146): https://drive.google.com/file/d/0ByD5KWyBWEfGZFMyZEh5YjRKUEE/view?usp=sharing

Code file: https://drive.google.com/file/d/0ByD5KWyBWEfGc09wZVBON0p4czQ/view?usp=sharing

Is this expected from a memcached bucket? I don’t get the same behaviour from a Couchbase bucket.

Regards
Jacob

jmorris · August 27, 2015, 11:28pm

@jacob_michaelsen -

No, it’s not expected…it looks like perhaps a bug. I’ll dig deeper into your logs and see what I find.

-Jeff

jacob_michaelsen · September 7, 2015, 10:38am

@jmorris Any news on this one?

jmorris · September 9, 2015, 4:59am

@jacob_michaelsen -

Using Couchbase Server 3.1.0 and SDK 2.1.4 I tried to reproduce with your code and could not. I started with a 3 node cluster, then added a node and re-balanced. I then removed a node and re-balanced and did that again. Finally i added both nodes back into the cluster and re-balanced.

I never encountered any authentication or ClientFailure. What version of Couchbase Server were you using? Did the memcached bucket “cache” have a password?

-Jeff

jacob_michaelsen · September 9, 2015, 9:33am

@jmorris -

We are using 3.0.3 and SDK 2.1.4.

The “cache” bucket do have a password, when I removed it rebalance works as you described it, I also did not get any errors.

I got another bit strange behaviour from the web interface, ops per seconds stays at 0 during my load test, only store operations causes ops per second to increase. On a production cluster ops per second is reported accurately (I assume). What could cause it to stay a 0? When debugging, I do get bytes back for the document I try to get, so there is NO functional issues with the client or server.

jmorris · September 9, 2015, 4:20pm

I made a bug ticket for this: Loading...

I don’t know, I haven’t heard of this…what happens when you remove the Thread.Sleep(1000)?

jmorris · September 9, 2015, 5:56pm

@jacob_michaelsen -

I couldn’t replicate this issue with a Memcached bucket and password. I noticed in your code, that you weren’t passing the password into ClusterHelper.OpenBucket so you may want to check this. I closed the ticket for now, but if you can provide steps to reproduce I’ll re-open it.

QE confirmed the bug, so I reopened the ticket: Loading...

Also, the 0 ops in the Management Console may be a bug in the console; I noticed the same thing doing Upsert, but when I tabbed to another tab and back, it was working.

-Jeff

jacob_michaelsen · September 10, 2015, 5:24am

@jmorris -

Okay, thank you.

About the 0 ops, I am not using Thread.Sleep at the moment, but I will see if I can do some more test on this, maybe try a new install or rewriting the client part.

Topic		Replies	Views
ClientFailure when Getting concurrently .NET SDK	13	4345	April 15, 2015
Server down while removing nodes + rebalancing Couchbase Server	7	2674	January 26, 2015
Java Client Connection keep closing and reopening, Auth failure Java SDK	5	9433	September 11, 2013
Failure during rebalance Couchbase Server	5	5721	July 2, 2013
Rebalance/Architecture Couchbase Server	1	1901	January 20, 2014

Rebalance client failure

Related topics