Java Couchbase Client Failure Mode Redistribute not working

eadpprmdev · November 19, 2014, 9:31pm

Hi, We are using couchbase java client 1.4.5. and we find the FailureMode.Redistribute does not work as expected.
As the API said
Redistribute will
Move on to functional nodes when nodes fail.
In this failure mode, the failure of a node will cause its current queue and future requests to move to the next logical node in the cluster for a given key,

However, when we manually drop all requests in a Couchbase server node, the client still try to push some documents to that failed nodes instead of distribute to other nodes.

So here is my question:

1.How does Couchbase determine which is failure node? Do we need to manually remove the failure node from node list in client?

2.If it detects the failure node, why it still trying to connect to the same node?

daschl · November 20, 2014, 6:19am

Hi,

if you use FailureMode.Redistribute, the Java Client will requeue the operation until the master node for the partition comes back online or is failovered to a different point. So we are trying to achieve best-effort deliver up to the point where the operation times out.

If you are using FailureMode.Cancel, what happens is that instead of requeuing the operation, we just cancel it right away - so you get some kind of fail-fast behavior.

Note that the difference to regular memcached buckets here is that there is only ever one specific node where we can find the data, so we need to wait until it is accessible again or cancel it.

eadpprmdev · November 21, 2014, 8:14pm

Hi dasch1.
Thank you so much for you reply, really helpful.

I understand the request will wait for the master nodes’ fail-over. However what if the master node is ok in cluster but the client has some connection issue with the master node? Under such case, redistribution will always failed since it always go to the same node.
So my question is whether there is away to rebuild the hash to let the key go to another node instead of the disconnected node?
Thanks

daschl · November 22, 2014, 9:50am

Hi @eadpprmdev,

From a client point of view it doesn’t matter if you have a connection issue or the node is really down, it can’t access it so it’s not able to read and write data to this partition.
Keep in mind that Couchbase wants to make sure that your data is safe, so we cannot just write to another node. There is always only one node available to write for a given partition. You can read from replicas though if you are clear about the tradeoffs (data consistency).

The client will try to reconnect the socket with a increasing interval but will try forever until you shut it down or it gets a new configuration, indicating that the target node is gone (for example because you initiated manual failover or automatic failover has been triggered).

Topic		Replies	Views
Node failure blocks Java client Java SDK	12	4951	April 5, 2017
Simple solution with two nodes - fails entirely if one node is not available Java SDK java	15	2470	February 25, 2020
Is it possible to skip requests to failed node? Java SDK	3	1847	January 7, 2015
Couchbase Cluster 2.2.0 - Data Issues Post Failover Couchbase Server	0	1776	February 26, 2015
Swap(?) rebalance in a single node failure scenario Couchbase Server	2	1428	October 5, 2018

Java Couchbase Client Failure Mode Redistribute not working

Related topics