We are having some issues with our buckets rapidly increasing in disk size due to binary data from presumably attachments not being cleaned up when the corresponding document is deleted. Our use case involves customers creating documents containing data, and a server component fetching this data and removing it from the database. This involves many small documents with 1 or more attachments (image data) which are created and shortly after (matter of hours) deleted again when they have been processed.
Our setup involves a Couchbase cluster consisting of 3 nodes, and in front of that 3 sync gateway instances which are used by the backend server and mobile clients. We can consistently reproduce this issue as follows:
Have a bucket up and running.
Run the following n1ql query: echo 'SELECT * FROM `bucketname`;' | cbq | grep "\u003cbinary" | wc -l
=> This will print the number of binary blobs in the database, lets call this number X.
Insert some documents with attachments into the bucket
Rerun the above query to get the new number of blobs, lets call this number Y. You will see that Y > X, obviously.
Delete the documents
Rerun the query
=> We still get Y. I would assume this is because older revisions of the documents still refer to this attachment.
Run database compaction on the bucket to get rid of old revisions.
Rerun the query
=> I would expect the number to have dropped down back to X but instead we still get Y.
Is there a way to get the number of blobs down and free up the disk space they are consuming?
Even after waiting several days (and running compact), the query still returns the same result, so I don’t think query consistency is what is wrong here. We also see the bucket still taking the same size on disk as after adding the attachments, so the blobs are still really there, even after removing the corresponding documents.
There isn’t currently an automated task to clean up obsolete attachments. Since attachments could be referenced by multiple revisions of a document (or multiple documents), they don’t get purged when a revision gets deleted (tombstoned).
Shouldn’t obsolete attachments get deleted when all documents that have been referencing them have been deleted and a compact cycle has discarded all but the tombstone revision?
Glad to see there is a github ticket about this now. In the meantime, is there any way we can manually remove the binary blob files to free up some disk space? Or will this cause issues when doing this behind Couchbase’s back?