Index 98 % fragmentation? (data 15 GB vs disk size 900 GB)

kai-abc · October 25, 2023, 5:34pm

Community Docker 7.2.2 (ARM)

I have a collection of ~100M documents, 86 GiB on disk (as shown on the Couchbase web dashboard). Each document includes a receive timestamp t_r (JSON number, fractional Unix timestmap like 1698253421.123456) and a REST path as a JSON string, indicating where the document was downloaded from. I’ve created indices for both these fields. Surprisingly, the index for the timestamp has ballooned to 900 GB, even though the reported “data size” is just 15 GiB.

98 % fragmentation??

The indexer is currently rebuilding the index following what appeared to be a ForestDB database corruption (I had to remove the corrupted file manually as the indexer wouldn’t listen to the drop command; after that, while the indexer was trying to recreate the removed index, I issued the drop command and then create index command). According to the web UI, it’s about 61% done.

Question:
Is it normal for the Couchbase indexer to experience such significant data size bloat when rebuilding an index from the ground up?

I might be cutting it too close with resources. My AWS t4g instance has only 8 GB of RAM, and after noticing some overspending by Couchbase services, I limited the indexer to just 1 GB (with 1 GB for Data and 256 MB for Search). Before I set these low limits, the indexer process seemed to get killed a few times.

kai-abc · October 26, 2023, 5:49pm

Never mind, it turns out the indexer angered k8s for suddenly using too much RAM(*1), and got killed, leaving the index file corrupted again. I’m in the process of rebuilding the index one more time.

*1) Unfortunately, my Prometheus missed capturing that spike; the last recorded RAM usage was under 80%. Looks like something odd happened with the indexer during the next 15 seconds (scraping interval).

kai-abc · October 27, 2023, 6:32am

Size bloat is happening again, fragmentation 86 % now

kai-abc · October 27, 2023, 2:15pm

Every container stop (I even do sv stop couchbase-container within the container believing it’s “more graceful” than the normal k8s pod deletion, but is it really??) results in this error

2023-10-27T14:04:35.674+00:00 [ERRO][FDB] Crash Detected: Last Block not DBHEADER ff in a database file '/opt/couchbase/var/lib/couchbase/data/@2i/my_index_name_7939188921012867285_0.index/data.fdb.107'

and the indexer start to consume 100 % CPU, refusing to serve any requests (until when??)

My phone has 4 GB RAM and 256 GB storage and it’s working fine for me. Processing 100 GB dataset on an AWS server with 8 GB RAM should be an easy job for a database product with a decade of history, shouldn’t be?

system · January 25, 2024, 2:15pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index build showing inconsistent percentages and Crashing SQL++ n1ql	5	2722	September 30, 2016
99% fragmentation on indexes SQL++	5	3913	June 1, 2016
Weird behaviour of couchbase GSI indexes Couchbase Server	3	502	October 14, 2022
Couchbase 4.5 Community Index Fragmentation 98% Couchbase Server	4	1298	March 24, 2018
Couchbase 4.5 - increasing index storage Couchbase Server	2	633	March 28, 2019

Index 98 % fragmentation? (data 15 GB vs disk size 900 GB)

Related topics