Data de duplication on the disk queue


I believe that multiple outstanding updates to the same document will be de-duplicated on the replication queue. This is obviously a good thing!

Does the same thing happen on the local disk queue? We’re building an app which may have a large number of updates to the same document in quick succession. From our testing it looks like this is causing quite a lot of fragmentation and requiring compaction to be run more often.

If documents were de-duplicated on the disk queue this might help us. Also, is there any way to delay writes to disk. I.e. force CB to wait say, a few seconds before flushing new data to disk? This would also presumably help (assuming the docs are de-duped in the first place.)


Yes, documents are de-duplicated on the disk queue for disk-queue throughput optimization you can read more in the Couchbase Server Under the Hood whitepaper.

Hope that helps!