Index 98 % fragmentation? (data 15 GB vs disk size 900 GB)

Community Docker 7.2.2 (ARM)

I have a collection of ~100M documents, 86 GiB on disk (as shown on the Couchbase web dashboard). Each document includes a receive timestamp t_r (JSON number, fractional Unix timestmap like 1698253421.123456) and a REST path as a JSON string, indicating where the document was downloaded from. I’ve created indices for both these fields. Surprisingly, the index for the timestamp has ballooned to 900 GB, even though the reported “data size” is just 15 GiB.

98 % fragmentation??

The indexer is currently rebuilding the index following what appeared to be a ForestDB database corruption (I had to remove the corrupted file manually as the indexer wouldn’t listen to the drop command; after that, while the indexer was trying to recreate the removed index, I issued the drop command and then create index command). According to the web UI, it’s about 61% done.

Question:
Is it normal for the Couchbase indexer to experience such significant data size bloat when rebuilding an index from the ground up?

I might be cutting it too close with resources. My AWS t4g instance has only 8 GB of RAM, and after noticing some overspending by Couchbase services, I limited the indexer to just 1 GB (with 1 GB for Data and 256 MB for Search). Before I set these low limits, the indexer process seemed to get killed a few times.

Never mind, it turns out the indexer angered k8s for suddenly using too much RAM(*1), and got killed, leaving the index file corrupted again. I’m in the process of rebuilding the index one more time.

*1) Unfortunately, my Prometheus missed capturing that spike; the last recorded RAM usage was under 80%. Looks like something odd happened with the indexer during the next 15 seconds (scraping interval).

Size bloat is happening again, fragmentation 86 % now

Every container stop (I even do sv stop couchbase-container within the container believing it’s “more graceful” than the normal k8s pod deletion, but is it really??) results in this error

2023-10-27T14:04:35.674+00:00 [ERRO][FDB] Crash Detected: Last Block not DBHEADER ff in a database file '/opt/couchbase/var/lib/couchbase/data/@2i/my_index_name_7939188921012867285_0.index/data.fdb.107'

and the indexer start to consume 100 % CPU, refusing to serve any requests (until when??)

My phone has 4 GB RAM and 256 GB storage and it’s working fine for me. Processing 100 GB dataset on an AWS server with 8 GB RAM should be an easy job for a database product with a decade of history, shouldn’t be?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.