XDCR "aggresiveness"

Hi,

I am in the situation where I only have 2 nodes to run CB. This means I have to use XDCR replication to make sure both nodes have the same information.

I have XDCR up-and-running, but I notice that it is sort of "lazy". When I put write/read load on the nodes I notice that at some point each CB instance has thousands of pending replications which means the systems are not in sync. After a while it seems to pick up the speed and after 4 to 5 minutes it is in sync.

Unfortunately by that time the data in CB is a mess because updates and delete operations have gone wrong as the application tries to delete a record on CB instance B while it was added to CB instance A and not replicated yet.

I've experimented with the XDCR settings like, xdcrOptimisticReplicationThreshold and xdcrMaxConcurrentReps. It appears to get in sync quicker, but it still takes 4 to 5 minutes and that is way too long.

For my application I need instant replication between 2 CB instances.

Please provide some hints towards a solution.

Best Regards,
Erik

1 Answer

« Back to question.

Erik,

Do you mean that you have only a 1 node cluster on both sides?
what kind of hardware are you running these on? What is your front end workload?
If you want high availability you need to use intra cluster replication. If you have more than 1 nodes, you can set the intra cluster replication to 1 and it will create a replica copy for you. You can manually failover a node to promote the replicas to active.

If you are looking for disaster recovery, you can use XDCR, but XDCR is a heavier operation and requires a minimum of 3 nodes. It works on a per partition basis and by default has 32 streams it uses. These streams round robin across the 1024 vBuckets. It is uses data that is persisted to disk, meaning that you have to wait for data to persist to disk and it will then get picked up by the XDCR engine.

XDCR can be done in seconds if not milliseconds depending on various factors. Hope this helps explain how XDCR works.

Dipti,
XCDR requiring 3 nodes seems a bit strange.
In earlier posts, users were recommended to do use 2 single-nodes with XDCR between them over a dual-node cluster with replication. The reason for that was that such a dual node cluster does not seem to give access to the replicated documents of a failed server. E.g.:
http://www.couchbase.com/forums/thread/if-one-two-nodes-goes-down-then-a...

Also in our (=Erik's) case, two nodes are sufficient for capacity and performance. And it should also be sufficient for redundancy. How can we ensure fast replication, plus availability of all data when one node goes down?