Fragmentation - 30% of what

Hello,

At 30% fragmentation, compact process kicks in. 30% of what is what I am trying to understand. Is it 30% of total disk space, 30% of total used space I.e. space occupied by couchbase docs, 30% of documents itself or something else?

Thanks

@pccb,

Here is a link what will answer your question.
https://developer.couchbase.com/documentation/server/3.x/admin/Tasks/tasks-compaction-process.html

I had a look and It does not answer my question.

Since CB uses an append-only storage engine the 30% talks about the percentage of data to old rev/versions of data that is stored.

In this image a piece of data is written to disk. Since it is only one document and the first document. The fragmentation percentage is 0%

In the below image 2 new writes came in: one a new document and the 2nd a newer version of the old document. So right now its a fragmentation % of 50%.

Compaction would happen because its greater than 30%. The file would eventually look like this below.

so if you have 5 million documents and only 10,000 get updated per day compaction would probably not happen for a few weeks/months.

1 Like

Thanks and appreciate. If I may, is there a way to check current fragmentation? For e.g. in the above example, after the 2 new writes come in, is there a way to check current fragmentation and would it show me 50%?

This IMAGE BELOW is of the older Admin GUI but the stats are the same.

In the 5th Row you can see all the stats about:

  • data size (row 5 , column 1)
  • storage size(row 5 , column 2)
  • percentage / fragmentation % (row 5 , column 3)

As you can see in (row 1, column 4) there about 35K writes per second.
since (row 2 , column 4) items is staying steady at 17.5K docs that means all the writes are updates.
So it looks like in percentage / fragmentation % (row 5 , column 3) you see that the cluster compacted the documents to Zero and it went back to 30% again. So you should see it see/saw like that. NOTE since a bucket is made up of 1024 vBuckets/files/shards not all the files are compacted at a given time. If you look at percentage / fragmentation % (row 5 , column 3) per server you’ll see that each node is compacting at different times but in average they all do them at 30%.