Cbbackup - memory usage

I’ve setup XDCR of production cluster to backup node and use cbbackup to dump all buckets on a daily basis. Couchbase data directory is around 55GB. When I run cbbackup, it goes crazy in terms of memory usage. It’s using up to 60GB of memory - which is beyond machine RAM (it has 16GB, out of that 7GB is used by couchbase itself) so it ends up using SWAP excessively. This all makes the backup process very very slow, around 20hours. Disk I/O (on the data partition) and CPU usage is pretty small, definitely memory usage (and because of swapping - IO on swap drive) being the bottleneck.
Is there any way which we could reduce memory usage by cbbackup?

Looking on other datastores I’ve been using before (*SQL, MongoDB) it was never a problem to backup 50GB of data…

I’m running on Amazon EC2 - the backup host is m3.xlarge instance, which provides two 40GB ephemeral drives. I’m using raid0 of them for swap volume.

Available on gist - output of dstat command. xvdb, xcdc are ephemeral local SSD drives used for swap, xvdf is used for couchbase data and writing backup outputs (with IO capacity of 1500IOPS):

1 Like

I am experiencing this same issue using version 3.0.1 Community on Centos6.5 in Rackspace. Python version “python.x86_64 2.6.6-52.el6”. I run this on one of the couch base nodes with the --bucket option to backup a specific bucket from all nodes’ data (not single node mode).

The cbbackup utility just uses all available memory until the VM’s kernel OOMkills it, as well as memcached.

Tasks: 242 total,   1 running, 241 sleeping,   0 stopped,   0 zombie
Cpu(s):  6.7%us,  5.2%sy,  0.0%ni, 86.1%id,  0.8%wa,  0.0%hi,  0.0%si,  1.2%st
Mem:  30822556k total, 28809180k used,  2013376k free,   114604k buffers
Swap:        0k total,        0k used,        0k free,  5510528k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
25979 couchbas  20   0 16.9g  16g 6468 S 14.6 56.4  17:07.67 memcached
26840 root      20   0 5474m 4.6g 8012 S 200.8 15.7  75:39.92 python
14072 couchbas  20   0 2148m 331m 1964 S  7.0  1.1 939:51.42 beam.smp
14043 couchbas  20   0 1275m  23m 1232 S  0.3  0.1  18:50.06 beam.smp 

Mar 9 20:44:10 couchdbwhois1113r kernel: Out of memory: Kill process 14119 (memcached) score 567 or sacrifice child
Mar 9 20:44:10 couchdbwhois1113r kernel: Killed process 14119, UID 497, (memcached) total-vm:17794644kB, anon-rss:17453500kB, file-rss:8kB
Mar 9 20:44:10 couchdbwhois1113r kernel: Out of memory: Kill process 4630 (python) score 320 or sacrifice child
Mar 9 20:44:10 couchdbwhois1113r kernel: Killed process 4630, UID 0, (python) total-vm:11581564kB, anon-rss:10773124kB, filers:4kB

Sorry to hear that you’re encountering OOM for memcached process. Will be great if you could share cbcollect_info log from the node when OOM killer kicked in.

I don’t think that it has anything to do with couchbase server itself. As it happend to both me and emccormick - the problem is that cbbackup process starts to consume all available memory, which in end-effect causes out of memory situation, which is normal and triggers OOM-killer.

From my experiments, I have figured out that cbbackup loads into memory the whole contents of a bucket when dumping its contents. On large clusters it just renders cbbackup useless tool, which is very pity because it’s impossible to dump the data. I have been suggested other solutions (like XDCR to another cluster for backup and do volume snapshots), but it’s not perfect for many reasons - and we would love to be able to use cbbackup tool anyway!

@korekontrol please refer to our release notes for v3.0.1, where we have documented steps to mitigate “hard OOM” on server side - Release notes v3.x

New option --sequential allow you to control the amount of vbucket backfills at any given time, hence it mitigates “hard OOMs” but it’s bit slow.

@asingh, thank you for this note. We’ll try it next time.