Cbbackupmgr restore is successful, but no data is restored. The buckets are empty

I’m using the latest version of Windows Docker (with WSL2) and Couchbase community 7.0.2. I’m trying to update our backup and restore scripts to use cbbackupmgr since the old tools are deprecated. I have successfully created a backup using this command from inside the Docker container containing Couchbase:

/opt/couchbase/bin/cbbackupmgr backup -a /usr/local/halix/backup -r prod -c http://localhost:8091 -u $DB_USERNAME -p $DB_PASSWORD --full-backup

I am attempting to now restore the backup on the same server, but a different Docker container containing a fresh Couchbase instance and empty buckets. I’m using these commands from inside the Docker instance:

couchbase-cli cluster-init --cluster-username=$DB_USERNAME --cluster-password=$DB_PASSWORD --cluster-ramsize=1024 --cluster-index-ramsize=256 --services=data,index,query

couchbase-cli bucket-create -c 127.0.0.1:8091 -u $DB_USERNAME -p $DB_PASSWORD --bucket=halix2 --bucket-ramsize=768 --bucket-type=couchbase

couchbase-cli bucket-create -c 127.0.0.1:8091 -u $DB_USERNAME -p $DB_PASSWORD --bucket=halixsessions --bucket-ramsize=128 --bucket-type=couchbase

couchbase-cli bucket-create -c 127.0.0.1:8091 -u $DB_USERNAME -p $DB_PASSWORD --bucket=usage --bucket-ramsize=128 --bucket-type=couchbase

cbbackupmgr restore -a /usr/local/halix/backup -r prod -c http://localhost:8091 -u $DB_USERNAME -p $DB_PASSWORD

The result is the command is successful. All 3 buckets are listed as successful. However 0 documents were inserted into each of the buckets. When I go to verify the backup data, everything looks fine. I run the command:

cbbackupmgr info -a “/usr/local/halix/backup” -r prod --all

It lists 1 repo called prod, size is 2.59GB, # of backups is 1. There is only 1 backup, 2023-05-09T16_24_07.0331632Z. 2.59GB, full backup, complete=true. It lists the 3 buckets I expect; halix2, halixsessions, and usage. With 2.5million, 17, 608 documents respectively. 153, 2, 0 indexes respectively.

When I run the restore command I listed above, it completes in about 400ms. It says “Restore completed successfully”. However transferred per bucket says 0B. It appears to be ignoring the data. I’ve tried a number of combinations of flags. I’ve tried specifying just one bucket, tried allowing the command to create buckets incase there was a mismatch, tried --force-updates, tried changing the server URL to values like “couchbase://127.0.0.1”, “couchbase://127.0.0.1:8091”, “couchbase://localhost”… etc. Some of them work, but with the same result of no records being inserted. Others fail with an invalid host error. This leads me to believe it isn’t an issue with connecting to the local couchbase instance.

I’ve also verified through the web UI that the 3 buckets exist and are completely empty. There are no documents in any of them. I’ve also verified that the backup directory permissions are ok, adding chmod 777 to it. In order to transfer the backup files from one container to another, I zip up the directory, ship it to AWS S3, download it into the new container, and unzip it to the same directory. The backup is 1.1GB, extracts to ~2.7GB. When I browse the directory structure it looks fine. I opened up the /logs/backup-0.log and went through the very nicely detailed output of the backup process. I noticed nothing of concern. A few "WARN"s for missing services; ‘Search’, ‘Eventing’, ‘Analytics’. I see it spitting out streams for the 3 buckets and at the end it says transfer cluster complete, transfer all data complete, backup completed successfully.

Both docker containers are using the exact same couchbase community 7.0.2 image and thus also the exact same versio of cbbackupmgr on both sides. I am at an absolute loss what might be going on. I don’t see a verbose mode option for the tool. Is there anywhere I can see verbose logging about the restore process? Maybe in the couchbase server logs itself? Could the fact I’m using docker with WSL2 and with Linux containers be leading to anything odd? I mean, as far as I can tell the backup data is there and readable. The restore tool throws no errors of any kind. Hoping anyone has some new suggestions I haven’t tried yet. Tomorrow I’m going to begrudgingly pull in a colleague who uses a MAC and have him try my modified scripts to create a backup of one of his databases and then try to restore it in a new container… see if he encounters the same issue. I have a feeling it’ll just work for him though :/. Any ideas would be greatly appreciated.

So this morning I decided to create the backup again, using a different Docker Couchbase container. Result of the backup seemed about the same, almost the same size, about 1.1GB zipped. Difference is the restore worked and actually restored the data. So the issue was something about the first backup itself.

Comparing the logs of the successful one with the previous which while looking successful doesn’t actually restore properly, the command was identical. The newer one didn’t have those WARN messages though that I saw in the older one. I’m not sure why those would matter, but they are what stand out as the difference. Here’s the block of logs with the warnings.

2023-05-09T16:24:07.055+00:00 WARN: (REST) (Attempt 1) (GET) Request to endpoint ‘/api/v1/backup’ failed due to error: failed to prepare request: failed to get host for service ‘Analytics’: failed to get host for service ‘Analytics’: Analytics Service is not available – logging.(*ToolsCommonLogger).Log() at tools_common.go:28
2023-05-09T16:24:07.055+00:00 (Couchbase) (Source) Will not be transferring Analytics Cluster Metadata because Analytics Service is not available
2023-05-09T16:24:07.055+00:00 (Plan) (Analytics) Successfully transferred Analytics metadata | {“number”:2,“duration”:“325.2µs”}
2023-05-09T16:24:07.057+00:00 (Plan) (Eventing) Transferring Eventing metadata
2023-05-09T16:24:07.057+00:00 WARN: (REST) (Attempt 1) (GET) Request to endpoint ‘/api/v1/backup’ failed due to error: failed to prepare request: failed to get host for service ‘Eventing’: failed to get host for service ‘Eventing’: Eventing Service is not available – logging.(*ToolsCommonLogger).Log() at tools_common.go:28
2023-05-09T16:24:07.057+00:00 (Couchbase) (Source) Will not be transferring Eventing Metadata because Eventing Service is not available
2023-05-09T16:24:07.057+00:00 (Plan) (Eventing) Successfully transferred Eventing metadata | {“number”:3,“duration”:“52.5µs”}
2023-05-09T16:24:07.061+00:00 (Plan) (Search) Transferring full text index aliases
2023-05-09T16:24:07.061+00:00 WARN: (REST) (Attempt 1) (GET) Request to endpoint ‘/api/v1/backup’ failed due to error: failed to prepare request: failed to get host for service ‘Search’: failed to get host for service ‘Search’: Search Service is not available – logging.(*ToolsCommonLogger).Log() at tools_common.go:28

The backup which fails to restore does have this line that verifies the data for the bucket I’m concerned most with (halix2) did back up properly.

2023-05-09T16:24:46.503+00:00 (Plan) (Data) Successfully transferred key value data for bucket ‘halix2’ | {“number”:23,“duration”:“27.8491111s”,“stats”:{“estimated_total_items”:2903457,“total_items”:2551614,“total_vbuckets”:1024,“vbuckets_complete”:1024,“bytes_received”:2335192821,“snapshot_markers_received”:1024,“failover_logs_received”:1024,“mutations_received”:2511047,“deletions_received”:40567,“started_at”:1683649458669907000,“finished_at”:1683649486503197200,“complete”:true}}

So I’m not sure why that data still didn’t restore for that backup. I’d love to know what happened because it is of course essential to be able to trust our backups when I move these scripts over into production. We cannot afford to have random backups looking ok, but DOA if we need to restore it at some point. I cannot upload the logs unfortunately as a new user, but would those WARN messages above explain why the backup won’t restore? What else from the log should I look at that might explain why? Thanks.

Hi @bwilliams,

Thanks for the extensive write-up of you’re debugging process, it’s extremely useful!

I’m using these commands from inside the Docker instance

To confirm, each of your commands are being run inside the docker container, using a command such as docker exec -it <id> bash?

/usr/local/halix/backup

How is directory being accessed in the docker container, are you passing the directory through as a bind mount?

$ curl -u <username>:<password> localhost:8091/pools
{"isAdminCreds":true,"isROAdminCreds":false,"isEnterprise":true,"allowedServices":["kv","n1ql","index","fts","cbas","eventing","backup"],"isDeveloperPreview":false,"packageVariant":"linux/docker","pools":[{"name":"default","uri":"/pools/default?uuid=792a10e8eeebcec9a177202739d3229d","streamingUri":"/poolsStreaming/default?uuid=792a10e8eeebcec9a177202739d3229d"}],"settings":{"maxParallelIndexers":"/settings/maxParallelIndexers?uuid=792a10e8eeebcec9a177202739d3229d","viewUpdateDaemon":"/settings/viewUpdateDaemon?uuid=792a10e8eeebcec9a177202739d3229d"},"uuid":"792a10e8eeebcec9a177202739d3229d","implementationVersion":"7.1.3-3479-enterprise","componentsVersion":{"asn1":"5.0.18","crypto":"5.0.6.3","chronicle":"0.0.1","public_key":"1.12.0.1","os_mon":"2.7.1","inets":"7.5.3.1","kernel":"8.3.2.1","stdlib":"3.17.2","ale":"0.0.0","lhttpc":"1.3.0","sasl":"4.1.2","ssl":"10.7.3.3","ns_server":"7.1.3-3479-enterprise"}}

Please could you provide the output of the previous command for both the clusters you have running.

When I run the restore command I listed above, it completes in about 400ms.

If you still have this backup archive, please could collect the logs using `cbbackupmgr collect-logs` and share them?

Thanks in advance,
James