Auto Auto fail-over?

itay · January 25, 2015, 7:50am

Regarding fail over, I just want to be sure that my assumptions are correct.

Assumptions:

If I have 4 servers with 1 replica and 1 server fails and shutdown, ~25% of the data will be missing ?
This may be corrected if I manually fail-over. However, until I do so, data will still be missing, even though I have replica ?
This may be auto-corrected if I set up auto fail-over. However, until the automatic fail-over settles, there can be few minutes of missing data ?
There is no way that there will not be any downtime ?

Itay

cihangirb · January 26, 2015, 5:17pm

That is correct if you want to favor consistency in our system. If you like availability you can do a few things;

you can read replicas even without failover taking effect (manual or auto)
you can set up a separate cluster and through XDCR maintain bi directional replication. You are not bound with failover logic in that case. You can write to any of the clusters and bi directional replication will take the mutation on any cluster and replicate to the other. We have customers who do this with 200ms timeouts on their operations. Many customers do this if they can tolerate and AP system as opposed to a CP system.
Do you favor availability over consistency? would this work for you?
thanks
-cihan

itay · January 29, 2015, 2:46pm

@cihangirb, thanks

I think that the web server can automatically detect that a CB server is down and initiate queries against the replica.
Can I ?

cihangirb · January 29, 2015, 4:31pm

you can certainly do that based on a timeout. your retry can issue a replica read. however replica reads come with some downsides - for example if your timeout on the original read was due to an overloaded server (could not respond in time) and your replica was not fully caught up on replication, your replica read will end up reading stale data. As long as you are ok trading in consistency for better availability, you are in good shape in these cases.

You can avoid this situation by issues your writes with ReplicateTo flag to ensure replica is updated before you get the write ack. with that you don’t have the issue but replicateTo has a higher latency due to laws of physics. (and we don’t have a way to beat that yet :))
thanks

Topic		Replies	Views
Auto failover on 1 of 4 nodes in a cluster - weird behaviour Couchbase Server	4	1450	May 25, 2017
Replication Fails with auto-failover? Couchbase Server	0	1830	August 8, 2014
Fail over failing Couchbase Server	1	653	May 27, 2018
Auto rebalance after node failure Couchbase Server	11	5593	May 17, 2017
Automatic failover in an environment where any server could die at any time Couchbase Server	1	1280	April 27, 2017

Auto Auto fail-over?

Related topics