I’m playing with XDCR (Couchbase 4.0) and I noticed a very strange behavior after both cluster are idle for some time. It almost feels like it’s frozen or was sleeping and waking up very slowly.
I’ll explain in more details my experiment and show you a few graphs.
I have two one-node clusters with configured bi-directional Cross Datacenter Replication between them. I start inserting ~47k document in one of the clusters. What I see on Outbound XDCR Operations graphs is that the number of mutations is only growing and nothing is replicating.
Here is the first graph.
The number of mutations is 140K because I inserted the same 47K documents three times.
The mutations graph stays flat for about 3-4 minutes and then 1000 documents get replicated. I see this picture on the graphs.
After another 3-4 minutes I see second batch of 1000 docs being replicated and the same spike on the graphs.
Then a few more minutes later half of all mutations gets replicated and I see this.
And a few more minutes later all the rest mutations get replicated.
In total the whole replication process took about 12-13 minutes and both clusters where doing absolutely nothing at that time. After I inserted all the documents, which took maybe half a minute, there was absolutely no load on the servers.
Here is how mutations graph looks on the hourly scale.
I started my test at 5.00pm, previous small spike presumably related to expired documents being purged from the database.
And here is how “mutations replication rate” graph looks.
You can see two small spikes when 1000 docs were replicated each time and then two bigger spikes when 68.5K mutations were replicated.
I was able to reproduce this problems several times but only when both clusters had been idle for some time. If I tried to insert more documents after the first test, the mutations queue was draining very fast and everything were replicating almost in real time.
So, does anyone know what could be the problem here?
I mentioned that I use Couchbase 4.0 on both clusters. All XDCP configuration parameters were left with default values.