XDCR
I have a couple of questions regarding xdcr
1. I read "In the event of a more significant network failure where the destination cluster is unavailable for more than 30 seconds, you may need to delete and recreate the replication.". Does this mean that we have to manually recreate replication configs during network failures ? Doesn't xdcr queue up the items to be replicated ?
2. In the event of a more complicated document conflict will xdcr error out and allow document conflict to be manually resolved ?
As far as your point (1) is concerned: I believe the docs are misleading here. I've personally tested XDCR with a link that I then severed (by taking down the remote cluster) and re-connected and the link worked just fine afterwards, even after several hours of being down.
Yes, Thanks MichaelL. We'll fix the documentation.
In general XDCR is very resilient even on network failures. It maintains checkpoints and will only resend data from the last sent checkpoint.
Out of curiosity, how many nodes are you testing XDCR on and for how many documents?
It was a small test case: two nodes in one cluster and only one in the other, with about a million records. I let the replication get halfway(-ish), then shut down the remote cluster, went to lunch, and restarted it afterwards. The replication continued just fine once I restarted the remote cluster.
Great. good to hear that it worked well. good luck with your project.
1. For the typical network failures, XDCR will continue to retry sending mutations based on this parameter xdcrFailureRestartInterval (Interval for restarting failed XDCR) . XDCR creates checkpoints every 30 mins. So it will check if mutations that happened in the last 30 mins have been replicated. If they have, it will not send the new "documents" mutated, only do a check. If the mutations are not replicated, it will resend data as well. So its pretty resilient to network failures.
The checkpoint interval (xdcrCheckpointInterval) can also be changed, however it is not recommended particularly in case of bi-directional replication.
In case of a much bigger network problem and you need manual intervention as the destination cluster has a new IP or is a completely new cluster, you should delete and recreate replication. If its a completely new cluster, it will need to resend the entire dataset.
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-resta...
2. In case of conflicts, currently XDCR will resolve conflicts based on the number of mutations on each document. Currently, 2.0 doesn't have the ability to do custom resolution.