Cbbackupmgr is failing with "database is locked" error

Hello there,

Running into some issues with performing a backup for the Couchbase server.

Server version:
Enterprise Edition 6.6.0 build 7909

Backups are being run from the client running couchbase:6.6.5 docker image.

Full error message:

2022-05-04T00:38:44.396+00:00 (DCP) (data) (vb 321) Creating DCP stream {"uuid":200414331219382,"start_seqno":0,"end_seqno":5145,"snap_start":0,"snap_end":0,"retries":0}
2022-05-04T00:38:44.607+00:00 WARN: (DCP) (data) (vb 589) Received an unexpected error from the sink callback, beginning teardown: failed to open vBucket: failed to open vBucket 589: failed to open vBucket 589: failed to open index: failed to set 'user_version': database is locked -- couchbase.(*DCPAsyncWorker).handleDCPError() at dcp_async_worker.go:554
2022-05-04T00:38:44.607+00:00 (Stats) Stopping stat collection
2022-05-04T00:38:44.608+00:00 (DCP) (fdata) (vb 589) Stream closed because all items were streamed | {"uuid":51986546345577,"snap_start":0,"snap_end":6220,"snap_complete":true,"last_seqno":0,"retries":0}
2022-05-04T00:38:44.608+00:00 (DCP) (data) (vb 604) Stream closed because all items were streamed | {"uuid":88333266527,"snap_start":0,"snap_end":0,"snap_complete":true,"last_seqno":0,"retries":0}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xa8e919]

goroutine 306 [running]:
github.com/couchbase/backup/storage.(*RiftDB).Commit(0xc0000e39a0, 0xc000ae6e58, 0xc000ae6e60)
        /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/backup/storage/rift.go:339 +0xc9
github.com/couchbase/backup/storage.(*VBucketBackupWriter).closeVBucket(0xc0008d3500, 0xc00044024d, 0x8001, 0xc000449790)
        /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/backup/storage/vbucket_backup_writer.go:498 +0x20a
github.com/couchbase/backup/storage.(*VBucketBackupWriter).CloseVBuckets.func1(0xc00033bf40, 0xc000ae6e00, 0xc0008d3500, 0xc0004a4360)
        /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/backup/storage/vbucket_backup_writer.go:163 +0x9c
created by github.com/couchbase/backup/storage.(*VBucketBackupWriter).CloseVBuckets
        /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/backup/storage/vbucket_backup_writer.go:159 +0x161
2022-05-04T00:38:56.026+00:00 (Cmd) cbbackupmgr version 6.6.5-10080 Hostname: couchbase-daily-backup-5d7667b8cb-tz5ng OS: linux Version: 4.18.0-305.40.2.el8_4.x86_64 x86_64 Arch: amd64 vCPU: 16 Memory: 115497200

Wondering if someone ever seen this problem and could point me in the right direction?

Thank you!

Have you looked at opening a support ticket on this issue?

-Aaron

Thank you @biozal for your reply. I do not have support available.

Hi @rubenp,

This is an interesting issue, and one that (I believe) we’ve seen before, however, unfortunately it’s symptomatic of a couple of issues; we’ll probably need some more information to narrow it down.

Please could you provide the following information:

  1. What storage are you using, is it EFS/NFS?
  2. Could you provide/share a log collection (collected via cbbackupmgr collect-logs)?

Generally, we’ve seen this issue with EFS which limits the number of locks owned at once by a single process to 256, please see MB-51772 for more information.

For those that are interested, the panic seen here is a symptom of the failure, and not the cause of the failure and has been fixed in the 7.0.0 release of Couchbase (see MB-44020 for more information).

Thanks,
James

Thank you for your suggestions @jamesl33.

One additional observation - i have downgraded the client image to 6.6.0 to match patch version running on the server and am no longer observing the locking issue. However, all other symptoms of the backup failure are present and I can still see Go nil pionter dereference:

2022-05-04T21:07:54.152+00:00 (DCP) (data) (vb 0) Creating DCP stream with start seqno 0, end seqno 4496, vbuuid 203389256134838, snap start seqno 0, snap end seqno 0
2022-05-04T21:07:54.163+00:00 (DCP) (data) (vb 341) Creating DCP stream with start seqno 0, end seqno 4726, vbuuid 172733936097566, snap start seqno 0, snap end seqno 0
2022-05-04T21:07:54.183+00:00 (DCP) (data) (vb 1) Creating DCP stream with start seqno 0, end seqno 4226, vbuuid 71757318371847, snap start seqno 0, snap end seqno 0
2022-05-04T21:07:59.249+00:00 WARN: (DCP) (data) (vb 0) Stream closed due to unexpected error 'EOF | {"bucket":"data","last_dispatched_to":"10.78.220.5:11210","last_dispatched_from":"172.31.31.117:54950","last_connection_id":"6cc61d1debe95067/ea1961d9fb1f670a"}' -- couchbase.(*DCPAsyncWorker).End() at dcp_async_worker.go:439
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xa5a989]

goroutine 131 [running]:
github.com/couchbase/backup/storage.(*RiftDB).Commit(0xc000333e00, 0xc000488060, 0xc000488060)
        /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/backup/storage/rift.go:339 +0xc9
github.com/couchbase/backup/storage.(*VBucketBackupWriter).closeVBucket(0xc00144c000, 0xc000440000, 0x1, 0x4)
        /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/backup/storage/vbucket_backup_writer.go:498 +0x1eb
github.com/couchbase/backup/storage.(*VBucketBackupWriter).CloseVBuckets.func1(0xc0002efdc0, 0xc000488000, 0xc00144c000, 0xc0003109c0)
        /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/backup/storage/vbucket_backup_writer.go:163 +0x9c
created by github.com/couchbase/backup/storage.(*VBucketBackupWriter).CloseVBuckets
        /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/backup/storage/vbucket_backup_writer.go:159 +0x161
2022-05-04T21:08:08.752+00:00 (Cmd) cbbackupmgr version Unknown Hostname: couchbase-daily-backup-6c7c59b4d7-45729 OS: linux Version: 4.18.0-305.40.2.el8_4.x8

Hi @rubenp,

It looks like you’ve hit a separate issue there (the EOF) which could have been triggered for a number of reasons; this is quite potentially an issue that has already been addressed in a later version.

The panic here is happening for the same reason, an unexpected error triggering teardown, resulting in the closure of a vBucket that hadn’t been opened.

To further investigate the EOF error, we’d need the cluster logs as well as the logs from cbbackupmgr. Please let me know which steps you plan to take next and we can try help debug the issue.

Thanks,
James

Hi @jamesl33

Thank you for your valuable feedback.

I was able to remediate the problem by switching from CIFS (with Azure File) mount to block-based storage (Azure Disk) for the backup destination directory.

Hopefully, someone else will find this helpful.

Cheers,
Ruben

1 Like