Compaction internals

arihant_rk · January 5, 2016, 6:31am

Hi,

I had studied compaction process which is given in docs , but i am confused of creating a new file during compaction process which is used for active data. can anyone explain me the how old or new data files used for data operation for client.

drigby · January 5, 2016, 11:09am

(I’m assuming you’re asking about Key-Value or Map/Reduce - the newer GSI indexing may be different).

The couchbase file format (couchstore) is append-only - data is only ever written to the end of the file, and once written previous blocks are immutable.

This has a number of helpful properties, including being resilient to corruption - if a write fails / disk dies / process crashes in the middle a write, the worst that should happen is the very last chunk of data written to the file is lost, but all previous data should be intact.

One downside to this approach is the size of the data files will grow every time any change is made, as we only append to the file - the old contents of a particular document will still be present somewhere earlier in the file.

To solve this, the extent of file fragmentation is monitored, and when it reaches a given threshold a new, compacted file is created side by side the existing, fragmented file. Once the new compacted file has been created (and contains up to the same sequence number as the previous file), the old and new files are atomically “swapped” - all data access (both reads and writes) will be done to the new file.

So there is only ever one file which is actively used by the storage layer - we just “flip” one out for the other when the current file gets to large (i.e. fragmented).

The low-level details of couchstore are at https://github.com/couchbase/couchstore/wiki/Format if you’re interested.

arihant_rk · January 9, 2016, 10:31am

Thanks drigby… Understood low level internals of appending file…

Topic		Replies	Views
Compression Mode & Bucket Compact Couchbase Server	2	456	September 27, 2023
Fragmentation - 30% of what Couchbase Server	5	2575	November 5, 2018
Understanding compaction Couchbase Server	0	376	July 1, 2021
Couchbase session storage and growth of disk usage Couchbase Server	1	1857	September 17, 2013
Fragmentation high after bulk data load Couchbase Server	2	1234	October 11, 2018

Compaction internals

Related topics