Xdcr and persistence blocked each other

Covering · December 18, 2013, 3:05am

We have a very heavy write once a day which lasts for 1-2 hours and no more write again. It is found that xdcr didn’t finish. In fact, persisting to disk can not get completed before I deleted xdcr (about 6hours). When xdcr is deleted, disk write queue grows from 800k to 20M quickly and then decreased to 0; beam.smp reduced from 4.5g to 1.6g memory usage; after a while the docs fragments decreased from 43% to 41%.

So I guess, xdcr and persisting has some competition, which blocks persisting, fragment compacting, xdcr and even writing data(first 3 are about disk, the last is about buffer? ).

It seems an easy way to work around is having a scheduled xdcr, use a timer to start. Or, couchbase can detect that competition, and stop xdcr until the write completed.

Is there any other solutions I can try? Thanks.

drigby · December 18, 2013, 4:36pm

XDCR replicates after the document has been written to disk, so XDCR itself shouldn’t affect how quickly a write goes to disk; however of course that will generate the equivalent write across the network to the other data centre. XDCR itself is continuous once enabled, so I don’t understand your question about a timer.

It sounds like you may not have a sufficiently sized cluster to handle the load you are putting on it. If you haven’t already, might be worth reviewing the sizing guidelines: http://docs.couchbase.com/couchbase-manual-2.2/#sizing-guidelines

Covering · December 19, 2013, 4:23am

The problem is we are using hadoop to write couchbase, which makes disk io never enough. When XDCR is not enabled, it takes the cluster about 30 minutes to take the “disk write queue” down after hadoop job complete. When XDCR is enabled, it is still not completed after 5 hours. So is it possible to tell xdcr that, when the disk queue is too large, xdcr can sleep some time? I was thinking may be a timer can be used so xdcr can pause when the timer is started.

drigby · December 19, 2013, 10:17am

The typical use-case for XDCR is to maintain an exact, up-to-date clone of one cluster at another (normally remote) cluster. Hence it would be undesirable to pause or stop XDCR once setup.

Note that XDCR newer “stops” in the general sense - it is a continuous stream from source to destination cluster. There is an initial burst of data to initially bring the clusters in sync, but after that any updates to the source will be streamed to the destination.

As mentioned previously, it sounds like you need a larger cluster to provide the additional resources needed by XDCR. There is a good blog post on sizing for XDCR at: http://blog.couchbase.com/how-many-nodes-part-2-sizing-couchbase-server-20-cluster

Topic		Replies	Views
[XDCR] Is it possible to make XDCR buffer/wait before replicating data to the dest cluster Couchbase Server	2	2099	March 25, 2014
XDCR "aggresiveness" Couchbase Server	2	1976	September 20, 2013
XDCR replication couchbase 4.0 Couchbase Server	5	2181	May 23, 2016
Delay in XDCR Replication in couchbase Couchbase Server	2	3582	June 26, 2014
Server going down while XDCR Couchbase Server	1	2121	March 2, 2015

Xdcr and persistence blocked each other

Related topics