Delay in XDCR Replication in couchbase

Hello,

There is delay in my XDCR.
We have 6 Cluster and all having 3 nodes.

m using Couchbase 2.2.0_x86_64.

1 Master Cluster - 5 Slave Cluster

Across cluster we are using 170 GB RAM for Cocuhbase Still there is heavy delay in XDCR.
Please see below counts for lag (in Docs)

BUCKET = uid

Cluster 1 = 236036
Cluster 2 = 148820
Cluster 3 = 137848
Cluster 4 = 178688

BUCKET = userkey

Cluster 1 = 591201
Cluster 2 = 389658
Cluster 3 = 319156
Cluster 4 = 477773

Please let me know how things should i configure to avoid lag in XDCR.
Current Setting

XDCR Max replication per bucket 32
XDCR checkpoint interval 1800
XDCR batch count = 10000
XDCR batch size 6144
XDCR failure retrieval time 30
XDCR optimistic replication threshold = 1024

Thanks

1 Answer

« Back to question.

Hi There,
Obviously replication lag depends on a lot of things. Could you give a sense of sets/sec on the master cluster and what the topology is; do you have one cluster replicating to all others like A>B and A>C or are there cascading replications with A>B>C etc? Can you also tell me how you measure the lag with doc count? are you looking at a specific stat on the web UI?

There are a few things you can do to allocate more resources to XDCR.
- Increase the max replications per bucket: with 3 nodes you have ~340 vbuckets per node. but only 32 is being serviced at any one time. You can increase the count to increase the parallelism of replication. Obviously, it only works if you have processing capacity to be able to sustain this parallelism. If you have your master replicating to all other 5 clusters, this may be a lot of threads for the #cores and may hurt your performance instead.
- Ensure you have low latency on source cluster IO Path: XDCR waits for persistence to disk before it pushes changes to destination. if you want to lower the latency, look for ways to reduce IO latency on source especially.
- Make sure the sending and receiving clusters have enough processing capacity XDCR cannot replicate if the receiving cluster's are busy. Ensure you have processing capacity to be able to send on master and receive on destinations.

looking forward to the answers. thanks
-cihan

Thanks for your reply cihan,

1 .Topology = 1 Master Cluster (With 3 nodes ) Replicating to 5 Slave cluster(Each 3 Node)
2 .Delay coucnt = I have used below command for each cluster on same time.
and count the difference based on ITEM COUNT
curl -uAdministrator:** http://*******:8091/pools/default/buckets/uid

3 .processor : 8 Core / model name : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz

4 .Memory 60 GB Per Node == Acrss Cluster 170 for couchbase

5 . master 10.9K ops per second , Slave 2.08K ops per second