frequent 'not_my_vbucket' errors hitting cluster via a load balancer

I have a three-node cluster running CB 2.01 Community (couchbase-server-community_x86_64_2.0.1.rpm in particular) behind an F5 BIG-IP load balancer.

Using the test code (below the exception) if I hit each of the hosts individually, I see no errors, no matter how many times I run the script. But if I hit the load-balanced host, I *frequently* get the following exception:

2013-07-11 16:36:40.000 ERROR net.spy.memcached.protocol.ascii.StoreOperationImpl:  Error:  SERVER_ERROR a2b not_my_vbucket
Exception in thread "main" java.util.concurrent.ExecutionException: OperationException: SERVER: SERVER_ERROR a2b not_my_vbucket
	at net.spy.memcached.MemcachedClient$OperationFuture.get(MemcachedClient.java:1659)
	at notMyBucket.NotMyBucketTest.main(NotMyBucketTest.java:22)
Caused by: OperationException: SERVER: SERVER_ERROR a2b not_my_vbucket
	at net.spy.memcached.protocol.BaseOperationImpl.handleError(BaseOperationImpl.java:122)
	at net.spy.memcached.protocol.ascii.OperationImpl.readFromBuffer(OperationImpl.java:130)
	at net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:356)
	at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:305)
	at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:192)
	at net.spy.memcached.MemcachedClient.run(MemcachedClient.java:1444)
2013-07-11 16:36:40.002 INFO net.spy.memcached.MemcachedConnection:  Reconnecting due to exception on {QA sa=couchbase/10.93.193.146:11216, #Rops=1, #Wops=0, #iq=0, topRop=net.spy.memcached.protocol.ascii.StoreOperationImpl@18a61164, topWop=null, toWrite=0, interested=1}
OperationException: SERVER: SERVER_ERROR a2b not_my_vbucket
	at net.spy.memcached.protocol.BaseOperationImpl.handleError(BaseOperationImpl.java:122)
	at net.spy.memcached.protocol.ascii.OperationImpl.readFromBuffer(OperationImpl.java:130)
	at net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:356)
	at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:305)
	at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:192)
	at net.spy.memcached.MemcachedClient.run(MemcachedClient.java:1444)

"Frequently" meaning probably 2/3 of the times I run the script. It happens whether I flush the bucket first or not.

Here's my test code:

public static void main(String[] args) 
	throws Exception 
{
	int timeout = 500;
	String hostString = "couchbase:11216";
 
	MemcachedClient cl = new MemcachedClient( AddrUtil.getAddresses( hostString ) );
 
	for( int i = 1 ; i < 40 ; i++ ) {
		System.out.println( cl.set( "s" + i, 0, "v" + i ).get( timeout, TimeUnit.MILLISECONDS ) ) ;
	}
 
	cl.shutdown(timeout, TimeUnit.MILLISECONDS);
}

(I'm using memcached-2.3.jar in this case.)

It does not seem to matter what port I use (in particular default vs non-default). We seem to have the same problem using a Python library, though it doesn't report any details (hence this Java test program).

We want to use the load-balancer so that we don't need to update multiple configurations whenever we add/remove nodes.

From my reading, "not_my_vbucket" should be handled by Moxi, and not passed back to the client?

Any ideas how to get around this?

Thanks.

1 Answer

« Back to question.

Hello,

The short answer is : do not use a load balancer.

From a resource management point of view:
You do not need to use a load balancer as Couchbase Cluster is managing all the nodes. Updating the configuration of all the nodes when you change them.

From an "applicaton":
The way the sharding -distribution of the data- is done in Couchbase is not compliant with Load Balancer, simply because you have a single active copy of a data by cluster so the client calculate on which node the operation should be done (for example: get, set, add, remove, ...).
So if you put a load balancer, the operation that the client sends may not go to the proper node and return you this exception.

Let me know if you need more information.

Tug
@tgrall