Fallback/failover problems

vitor.loureiro · November 2, 2016, 3:22pm

Hi

We are using couchbase trying out couchbase but now we are facing two problems regarding fallbacks/failover and we need some support here.
We are using the following setup:
3 couchbase machines with the replication factor of 2.
Version 4.5.0-2601 Enterprise Edition (build-2601).
libcouchbase runtime version 2.6.3
libcouchbase headers version 2.6.1

Problems :

how can i read the information from one replica? We are using the follow code but the “getFromReplica” method returns nothing (and we don’t see any activity in the couchbase web interface regarding the get).

$cluster = new CouchbaseCluster(‘couchbase://127.0.0.1’);
$bucket = $cluster->openBucket(‘default’);
$key = ‘testKey’;
$bucket->upsert($key, ‘value’);
var_export($bucket->getFromReplica($key));
We are getting timeouts from the cluster if one replica is down until auto fail-over removes the node, is this normal? can we avoid this?

Vítor Loureiro
HoP from the Jumia Mall team

avsej · November 2, 2016, 8:26pm

This was a bug. I’ve fixed it in Loading... and the fix will be released with the next release.

Timeouts when reading replica? or just get documents from that node?

manuel.hurtado · November 2, 2016, 8:26pm

Hello,

First of all, when you size your cluster, if you want to support one node failure with 2 replica copies you will need at least 4 nodes. Otherwise, in case of node failure, you cannot write data since you cannot make two copies.

Second, Couchbase support “Read your own write” (RYOW) but it works when reading from the primary copy, that is why best practice is to read always from the primary, and read from replica only in case of failure.

In your example, you can try upsert with option replicate_to = 2. In that way write operation should block until replica is done.

Regarding 2), it is normal that in case of a node failure, 1/n of the reads will timeout. In that case you can catch the error and read fro replica.
For writing, 1/n of the writing will fail. You can implement a retry strategy in your client code to assure that you will write after the failover happens. From that point all works fine again.

That is a CP scenario in terms of the CAP theorem. In this case Couchbase works as CP, giving Consistency precedence over Availability.

vitor.loureiro · November 3, 2016, 10:44am

First of all thank you for the quick feedback.
That was just a simple example to demonstrate that the function wasn’t working as intended, we plan to use the getFromReplica just in case of timeout.

Reading data from that node, if couchbase already knows that the node is down, why we need to wait until the timeout occurs?

I can’t find any place in the PHP documentation with the replicate_to option or any other option at all, can you provide me the updated documentation?

What type of retry strategy can we use to assure that writes will not fail in case of failover?

Thank you in advance.

avsej · November 3, 2016, 11:15am

because node hasn’t failed over

Examples in this repository contain one for durability

github.com

couchbaselabs/devguide-examples/blob/master/php/durability.php

<?php

/*
 * This sample assumes that the bucket has configured replication
 */

$cluster =  new \Couchbase\Cluster('couchbase://localhost');
$bucket = $cluster->openBucket('default');

/*
 * In the PHP SDK you can specify "maximum" persistence and
 * replication by specifying -1 for either valie
 */
$bucket->upsert('docid', ['some' => 'value'], ['persist_to' => -1, 'replicate_to' => -1]);

// Store with persisting to master node
$bucket->upsert('docid', ['some' => 'value'], ['persist_to' => 1]);

// Note, this will fail if there are no replicas online
$bucket->upsert('docid', ['some' => 'value'], ['persist_to' => 1, 'replicate_to' => 1]);

For example exponential backoff when you which will increase wait intervals for some point and then keeps retrying on some fixed interval until maximum retry limit reached and then return error to application.

vitor.loureiro · November 3, 2016, 5:59pm

Hi
Sorry but that documentation is not enough, i really need to see all possible options and values to all functions.

We saw your fix and we are trying to work with that but Couchbase is taking to much time to reply with a timeout. Can we set this timeout on the client side? How?

avsej · November 3, 2016, 8:23pm

Right now we have issue with API reference S3 bucket, but you can take a look at phpdocs:
https://github.com/couchbase/php-couchbase/blob/master/stub/CouchbaseBucket.class.php#L135-L146

Also you can specify timeouts through cluster URL
http://developer.couchbase.com/documentation/server/current/sdk/php/client-settings.html

Or you setter method for that:
https://github.com/couchbase/php-couchbase/blob/master/stub/CouchbaseBucket.class.php#L543

For example:

$bucket->operationTimeout(500000); // 500k microseconds = 0.5 second

avsej · November 3, 2016, 9:56pm

It was actually my issue, my browser cached 301 redirect. The docs are here
http://docs.couchbase.com/sdk-api/couchbase-php-client-2.2.3/classes/CouchbaseBucket.html#method_upsert

Topic		Replies	Views
Node failure blocks Java client Java SDK	12	5011	April 5, 2017
Get from replicate not working as expected Couchbase Server replica , java	3	2395	January 23, 2018
Can writes success when a Couchbase master node fails Couchbase Server java , server , sdk	1	1510	June 20, 2017
How getFromReplica works ? between nodes or between Data Centers? Couchbase Server	7	2692	October 1, 2018
What happens when a node in the cluster goes down? Couchbase Server	14	21789	December 29, 2018

Fallback/failover problems

Related topics