My team has been using Couchbase on VM’s for a while now, and we recently enabled throttling on disk I/O to stop “noisy neighbor” syndrome for other VM’s on the same physical machine. However, this has landed us in heaps of trouble because Couchbase will randomly (and sometimes not-so-randomly) start demanding a super high amount of disk I/O operations, get throttled, and then stop serving requests until it can get through its disk write queue.
It’s frustrating because Couchbase seems like it’s using far more disk I/O than other DB’s that other teams at our company are using. For example, MongoDB is well documented that if the entire working set can fit into RAM, the disk operations are pretty limited. That particular is more difficult with Couchbase since using XDCR means that each DC needs to cache the working set of every other DC in memory.
Couchbase isn’t as well documented but it seems like there should be similar tips for reducing disk I/O. For example, I read that setting the read_ahead setting on the VM to zero is a good idea for MongoDB. Would you expect that to be helpful for Couchbase as well? Are there any other known ways to reduce the amount of disk I/O that Couchbase uses?
For reference, we’re on community edition 5.1.1. We’re running XDCR in a bidirectional ring between 5 different datacenters around the world. The high disk I/O usage looks to be caused by any number of things, from high read/write volume, to high cache miss ratio, to routine DB operations (I think maybe compaction has caused it?), to high write volume on a different DC that then propagates through XDCR.
Any suggestions are very welcome. Thanks for taking the time.