XDCR: Changes not always replicating... goxdcr log shows error "Executing Action timed out"


We use XDCR to sync a database between two servers. Normally this works fine. However, in the past day we have noticed a couple of times where a document we have hand-edited in the Couchbase Server GUI is not replicated when saved. Later we made another change to the same document which then synced okay. Since this success, we haven’t been able to replicate the failure again.

Looking in the goxdcr log, we can see the following error repeated fairly frequently, including around the time that we edited the document:

INFO GOXDCR.GenericSupervisor: Executing Action timed out
INFO GOXDCR.GenericSupervisor: ****************************
INFO GOXDCR.GenericSupervisor: Received timeout error when checking pipeline health. topic=94f6421562b2380b840fb02bb2556896/mybucket/mybucket

What does this error mean and could it be related to the failure to sync?

This is worrying as we are wondering if XDCR is reliable and whether we need to run a periodic “sanity check” to make sure both databases are synced properly.


Jut an update on this statement

“Later we made another change to the same document which then synced okay. Since this success, we haven’t been able to replicate the failure again.”

This is no longer true. We have had this error repeatedly today. On some docs, we can make another change and it will work, but on others, the document never syncs despite multiple changes. The only way we have got these docs to sync is to remove them and then recreate them. XDCR then spots the new doc and syncs it. Further changes then work as expected.

One further thing we’ve noticed which may be a clue with the failing document…
On the database that has the modification and is running XDCR, the document has revision "2-1501ca7193a300000000000002000000"
However, on the target database, the doc has revision “695-1501ca550bef00000000000002000000”

This seems very strange. As the source database is newer I would expect the target revision to be higher than on the target, but it isn’t. Am I correct to assume this and could it be the cause for my failed XDCR behaviour?

Regarding revision numbers (as per the last bit of my previous reply), we’ve just noticed that even new documents get crazy revision numbers. E.g. I’ve just created a new document through the Couchbase Server GUI and it is given revision number “371-150261d91fd100000000000002000006”

I would expect the first revision of a new document to be “1-…”. Is my assumption correct?

When this document (with rev “371-…”) syncs with XDCR, the destination gets a revision number of “1-…”. We actually have two way XDCR set up, so a change on either server gets replicated to the other. If I then edit this same doc on the target server and save, it is given a rev of “2-…” This is then NOT replicated back to the original server, presumably because it already has a higher revision.