Fallback/failover problems


We are using couchbase trying out couchbase but now we are facing two problems regarding fallbacks/failover and we need some support here.
We are using the following setup:
3 couchbase machines with the replication factor of 2.
Version 4.5.0-2601 Enterprise Edition (build-2601).
libcouchbase runtime version 2.6.3
libcouchbase headers version 2.6.1

Problems :

  1. how can i read the information from one replica? We are using the follow code but the “getFromReplica” method returns nothing (and we don’t see any activity in the couchbase web interface regarding the get).

    $cluster = new CouchbaseCluster(‘couchbase://’);
    $bucket = $cluster->openBucket(‘default’);
    $key = ‘testKey’;
    $bucket->upsert($key, ‘value’);

  2. We are getting timeouts from the cluster if one replica is down until auto fail-over removes the node, is this normal? can we avoid this?

Vítor Loureiro
HoP from the Jumia Mall team

This was a bug. I’ve fixed it in https://issues.couchbase.com/browse/PCBC-438 and the fix will be released with the next release.

Timeouts when reading replica? or just get documents from that node?


First of all, when you size your cluster, if you want to support one node failure with 2 replica copies you will need at least 4 nodes. Otherwise, in case of node failure, you cannot write data since you cannot make two copies.

Second, Couchbase support “Read your own write” (RYOW) but it works when reading from the primary copy, that is why best practice is to read always from the primary, and read from replica only in case of failure.

In your example, you can try upsert with option replicate_to = 2. In that way write operation should block until replica is done.

Regarding 2), it is normal that in case of a node failure, 1/n of the reads will timeout. In that case you can catch the error and read fro replica.
For writing, 1/n of the writing will fail. You can implement a retry strategy in your client code to assure that you will write after the failover happens. From that point all works fine again.

That is a CP scenario in terms of the CAP theorem. In this case Couchbase works as CP, giving Consistency precedence over Availability.

First of all thank you for the quick feedback.
That was just a simple example to demonstrate that the function wasn’t working as intended, we plan to use the getFromReplica just in case of timeout.

Reading data from that node, if couchbase already knows that the node is down, why we need to wait until the timeout occurs?

I can’t find any place in the PHP documentation with the replicate_to option or any other option at all, can you provide me the updated documentation?

What type of retry strategy can we use to assure that writes will not fail in case of failover?

Thank you in advance.

because node hasn’t failed over

Examples in this repository contain one for durability

For example exponential backoff when you which will increase wait intervals for some point and then keeps retrying on some fixed interval until maximum retry limit reached and then return error to application.

Sorry but that documentation is not enough, i really need to see all possible options and values to all functions.

We saw your fix and we are trying to work with that but Couchbase is taking to much time to reply with a timeout. Can we set this timeout on the client side? How?

Right now we have issue with API reference S3 bucket, but you can take a look at phpdocs:

Also you can specify timeouts through cluster URL

Or you setter method for that:

For example:

$bucket->operationTimeout(500000); // 500k microseconds = 0.5 second

It was actually my issue, my browser cached 301 redirect. The docs are here