Problem relates to: Couchbase Server Community Edition 4.1.0/4.5.0
I’ve been trying to set up a backup workflow for one of my Couchbase instances.
In my particular use case having data from the last 24 hours is absolutely crucial, while older data (up to 3 months) is not essential in a dire disaster recovery scenario. To this end I designed a flow that would take a full backup once a week, and then an additional differential backup each day of the rest of the week, for a total of 8 backups (overlap between last differential backup and next full backup). My idea with this is to be able to quickly restore a working data set for the last day with the –from-date/–to-date flags, and then use Couchbase’s conflict resolution features to handle the remaining restore with more time.
This works flawlessly using the cbbackup/cbrestore tools, they handled all my edge tests cases (you guys are awesome). However, the bucket in question is expected to be about 300GB when the full 3 months of data are in, which takes well over an entire day to backup (I am extrapolating from smaller tests, having about 2 hours for 20GB). A full restore would take even longer.
So I tried using cbbackupwrapper/cbrestorewrapper, which brought those 20GB backup times to about 15 minutes, which is pretty awesome. The problem is that I lose a lot of data restoring this way.
How to reproduce:
- Start with an empty bucket (I was using a 2 node Docker setup).
- Create a new document in the bucket.
- Take a full backup (
cbbackupwrapper -u <user> -p <pass> -P 4 -m full http://couchbase1:8091 backup/).
- Create a new document and modify the previous one.
- Take a differential backup (
cbbackupwrapper -u <user> -p <pass> -P 4 -m diff http://couchbase1:8091 backup/).
- Create a new bucket and restore the backed data to it (
cbrestorewrapper -u <user> -p <pass> backup/ http://couchbase1:8091 -b old_bucket -B new_bucket).
- Being that both backups are from the same day I expected both documents to be restored, with the first document keeping its updated state.
- Only the first document is restored, in its updated state.
What I’ve found out so far:
- There are 1024 vBuckets, and the first full backup tries to backup all of them, but having only a single document, only one of these vBuckets has anything to backup (I’m grouping vBuckets in slots of 100).
- When I perform the differential backup (after the first full one), a differential backup folder is added to the vBucket slot that contained the first document.
- However, in the vBucket slot that contains the second document a full backup folder appears.
- My guess is that the backup tool thinks there are two full backups and chooses the older one.
- If so, this would mean that I’d lose all data from a vBucket that was previously empty when performing a differential backup.
Does anyone has any idea if this is expected behavior or a bug? I haven’t been able to find a lot of information on this topic.