Auto Auto fail-over?

Regarding fail over, I just want to be sure that my assumptions are correct.

Assumptions:

  1. If I have 4 servers with 1 replica and 1 server fails and shutdown, ~25% of the data will be missing ?
  2. This may be corrected if I manually fail-over. However, until I do so, data will still be missing, even though I have replica ?
  3. This may be auto-corrected if I set up auto fail-over. However, until the automatic fail-over settles, there can be few minutes of missing data ?
  4. There is no way that there will not be any downtime ?

Itay

1 Like

That is correct if you want to favor consistency in our system. If you like availability you can do a few things;

  • you can read replicas even without failover taking effect (manual or auto)
  • you can set up a separate cluster and through XDCR maintain bi directional replication. You are not bound with failover logic in that case. You can write to any of the clusters and bi directional replication will take the mutation on any cluster and replicate to the other. We have customers who do this with 200ms timeouts on their operations. Many customers do this if they can tolerate and AP system as opposed to a CP system.
    Do you favor availability over consistency? would this work for you?
    thanks
    -cihan

@cihangirb, thanks

I think that the web server can automatically detect that a CB server is down and initiate queries against the replica.
Can I ?

you can certainly do that based on a timeout. your retry can issue a replica read. however replica reads come with some downsides - for example if your timeout on the original read was due to an overloaded server (could not respond in time) and your replica was not fully caught up on replication, your replica read will end up reading stale data. As long as you are ok trading in consistency for better availability, you are in good shape in these cases.

You can avoid this situation by issues your writes with ReplicateTo flag to ensure replica is updated before you get the write ack. with that you don’t have the issue but replicateTo has a higher latency due to laws of physics. (and we don’t have a way to beat that yet :))
thanks

3 Likes