Constantly growing plasma index files

gizmo74 · November 6, 2019, 2:40pm

Hi,

We’re using Couchbase 6.0.3 Enterprise Edition and I see constantly growing
log.00000000000000.data files. After creating the index it’s only a few MB, but after a while it’s 1G or more.

I see Entries like this in indexer.log

2019-11-06T05:42:33.242+01:00 [Info] content/newestCat/Mainstore#9608566369239551069:0 Plasma: logCleaner: starting… frag 93, data: 1086342, used: 16954970 log:(755273830 - 772755456)
2019-11-06T05:42:33.245+01:00 [Info] content/newestCat/Mainstore#9608566369239551069:0 Plasma: logCleaner: completed… frag 93, data: 1086342, used: 16588370, relocated: 1111, retries: 0, skipped: 31
02 log:(755273830 - 772755456)

If I read this correctly, then logCleaner is starting with Frag 93 and Ending with Frag 93, so it is not compacting anything, right?

Thanks, Pascal

akhilmd · November 6, 2019, 7:20pm

@gizmo74, from the logs that you have pasted, there is a slight drop in the used size, but it appears that the logCleaner is not able to compact the log file by much.

From the indexer stats, could you tell me the data_size, items_count and num_items_flushed for this index? It would be more useful if you could attach the entire indexer.log.

I would also suggest you to contact couchbase support for a more thorough analysis.

gizmo74 · November 7, 2019, 9:57am

@akhilmd

Thanks for your answer. Here a simple, reproducible example. Of course I can open a enterprise ticket, but maybe this is also interesting for others, because it seems to be not limited to my production cluster. This example was tested on a single local couchbase instance running in docker. And I got a very similar result on the prod cluster.

created a bucket “test”
run sample script (see end of this post) once
create index: CREATE INDEX foo on test (ID);

after creating index:
root@286d98ddb415:/opt/couchbase/var/lib/couchbase/data/@2i/test_foo_7086060725474839149_0.index/mainIndex# ls -l
total 512
-rwxr-x— 1 couchbase couchbase 36 Nov 7 07:50 config.json
-rwxr-x— 1 couchbase couchbase 8192 Nov 7 07:50 header.data
drwxr-x— 3 couchbase couchbase 4096 Nov 7 07:50 index
-rwxr-x— 1 couchbase couchbase 507904 Nov 7 07:50 log.00000000000000.data

2nd run
(10min later after dump on disk)
-rwxr-x— 1 couchbase couchbase 36 Nov 7 07:50 config.json
-rwxr-x— 1 couchbase couchbase 8192 Nov 7 07:59 header.data
drwxr-x— 3 couchbase couchbase 4096 Nov 7 07:50 index
-rwxr-x— 1 couchbase couchbase 614400 Nov 7 07:59 log.00000000000000.data

run 20 times
(10min later after dump on disk)
-rwxr-x— 1 couchbase couchbase 36 Nov 7 07:50 config.json
-rwxr-x— 1 couchbase couchbase 8192 Nov 7 08:09 header.data
drwxr-x— 3 couchbase couchbase 4096 Nov 7 08:09 index
-rwxr-x— 1 couchbase couchbase 868352 Nov 7 08:09 log.00000000000000.data

run another 80 times
(10min later after dump on disk)
-rwxr-x— 1 couchbase couchbase 36 Nov 7 07:50 config.json
-rwxr-x— 1 couchbase couchbase 8192 Nov 7 08:19 header.data
drwxr-x— 3 couchbase couchbase 4096 Nov 7 08:19 index
-rwxr-x— 1 couchbase couchbase 1196032 Nov 7 08:19 log.00000000000000.data

do nothing for 20 minutes…, still the same (makes sense, because no update)
-rwxr-x— 1 couchbase couchbase 36 Nov 7 07:50 config.json
-rwxr-x— 1 couchbase couchbase 8192 Nov 7 08:19 header.data
drwxr-x— 3 couchbase couchbase 4096 Nov 7 08:19 index
-rwxr-x— 1 couchbase couchbase 1196032 Nov 7 08:19 log.00000000000000.data

next… run job once every 10 minutes… 1 hour later:
-rwxr-x— 1 couchbase couchbase 36 Nov 7 07:50 config.json
-rwxr-x— 1 couchbase couchbase 8192 Nov 7 09:34 header.data
drwxr-x— 3 couchbase couchbase 4096 Nov 7 09:34 index
-rwxr-x— 1 couchbase couchbase 3227648 Nov 7 09:34 log.00000000000000.data

so this log.00000000000000.data was growing from 0.5M to 3.2 while only updating data, not a single new item was added. So it seems also in this simple use case the log is always growing, compaction/logcleaner seems not to work as it should. Or I’m doing something completly wrong here.

Here the output of stats:

{
“Index”: “test:foo”, “Id”: 7086060725474839149,
“Stats”:
{
“MainStore”:
{
“memory_quota”: 483183820,
“punch_hole_support”: true,
“count”: 10000,
“compacts”: 36,
“purges”: 0,
“splits”: 31,
“merges”: 0,
“inserts”: 10000,
“deletes”: 0,
“compact_conflicts”: 61,
“split_conflicts”: 51,
“merge_conflicts”: 0,
“insert_conflicts”: 672,
“delete_conflicts”: 0,
“swapin_conflicts”: 0,
“persist_conflicts”: 0,
“memory_size”: 664198,
“memory_size_index”: 2412,
“allocated”: 1725478,
“freed”: 1061280,
“reclaimed”: 1061280,
“reclaim_pending”: 0,
“reclaim_list_size”: 1061280,
“reclaim_list_count”: 36,
“reclaim_threshold”: 11,
“allocated_index”: 2412,
“freed_index”: 0,
“reclaimed_index”: 0,
“num_pages”: 32,
“items_count”: 10000,
“num_rec_allocs”: 33413,
“num_rec_frees”: 19804,
“num_rec_swapout”: 0,
“num_rec_swapin”: 0,
“bytes_incoming”: 336669,
“bytes_written”: 3227648,
“write_amp”: 1.50,
“write_amp_avg”: 9.59,
“lss_fragmentation”: 94,
“lss_data_size”: 170649,
“lss_used_space”: 3227648,
“lss_num_reads”: 0,
“lss_read_bs”: 0,
“lss_blk_read_bs”: 0,
“lss_gc_num_reads”: 0,
“lss_gc_reads_bs”: 0,
“lss_blk_gc_reads_bs”: 0,
“lss_rdr_reads_bs”: 0,
“lss_blk_rdr_reads_bs”: 0,
“cache_hits”: 10002,
“cache_misses”: 0,
“cache_hit_ratio”: 1.00000,
“rlss_num_reads”: 0,
“rcache_hits”: 2,
“rcache_misses”: 0,
“rcache_hit_ratio”: 1.00000,
“resident_ratio”: 1.00000,
“mvcc_purge_ratio”: 1.36090,
“currSn”: 11,
“gcSn”: 9,
“gcSnIntervals”: “[0 9 10]”,
“purger_running”: false,
“mem_throttled”: false,
“lss_throttled”: false,
“lss_head_offset”: 0,
“lss_tail_offset”: 3227648,
“num_wctxs”: 22,
“num_readers”: 24,
“num_writers”: 8,
“buf_memused”: 421733,
“page_bytes”: 2140299,
“page_cnt”: 179,
“page_itemcnt”: 63452,
“avg_item_size”: 33,
“avg_page_size”: 11956
},
“BackStore”:
{
“memory_quota”: 483183820,
“punch_hole_support”: true,
“count”: 10000,
“compacts”: 312,
“purges”: 0,
“splits”: 55,
“merges”: 0,
“inserts”: 10000,
“deletes”: 0,
“compact_conflicts”: 295,
“split_conflicts”: 50,
“merge_conflicts”: 0,
“insert_conflicts”: 701,
“delete_conflicts”: 0,
“swapin_conflicts”: 0,
“persist_conflicts”: 0,
“memory_size”: 650528,
“memory_size_index”: 4612,
“allocated”: 4645708,
“freed”: 3995180,
“reclaimed”: 3995180,
“reclaim_pending”: 0,
“reclaim_list_size”: 1084547,
“reclaim_list_count”: 80,
“reclaim_threshold”: 25,
“allocated_index”: 4612,
“freed_index”: 0,
“reclaimed_index”: 0,
“num_pages”: 56,
“items_count”: 10000,
“num_rec_allocs”: 93754,
“num_rec_frees”: 80056,
“num_rec_swapout”: 0,
“num_rec_swapin”: 0,
“bytes_incoming”: 376669,
“bytes_written”: 3309568,
“write_amp”: 1.26,
“write_amp_avg”: 8.79,
“lss_fragmentation”: 94,
“lss_data_size”: 181080,
“lss_used_space”: 3309568,
“lss_num_reads”: 0,
“lss_read_bs”: 0,
“lss_blk_read_bs”: 0,
“lss_gc_num_reads”: 0,
“lss_gc_reads_bs”: 0,
“lss_blk_gc_reads_bs”: 0,
“lss_rdr_reads_bs”: 0,
“lss_blk_rdr_reads_bs”: 0,
“cache_hits”: 1078397,
“cache_misses”: 0,
“cache_hit_ratio”: 1.00000,
“rlss_num_reads”: 0,
“rcache_hits”: 1068397,
“rcache_misses”: 0,
“rcache_hit_ratio”: 1.00000,
“resident_ratio”: 1.00000,
“mvcc_purge_ratio”: 1.36980,
“currSn”: 11,
“gcSn”: 9,
“gcSnIntervals”: “[0 9 10]”,
“purger_running”: false,
“mem_throttled”: false,
“lss_throttled”: false,
“lss_head_offset”: 0,
“lss_tail_offset”: 3309568,
“num_wctxs”: 22,
“num_readers”: 0,
“num_writers”: 8,
“buf_memused”: 287342,
“page_bytes”: 6023974,
“page_cnt”: 712,
“page_itemcnt”: 159957,
“avg_item_size”: 37,
“avg_page_size”: 8460
}}
}

{
“Index”: “test:foo”, “Id”: 7086060725474839149,
“Stats”:
{
“MainStore”:
{
“memory_quota”: 483183820,
“punch_hole_support”: true,
“count”: 10000,
“compacts”: 36,
“purges”: 0,
“splits”: 31,
“merges”: 0,
“inserts”: 10000,
“deletes”: 0,
“compact_conflicts”: 61,
“split_conflicts”: 51,
“merge_conflicts”: 0,
“insert_conflicts”: 672,
“delete_conflicts”: 0,
“swapin_conflicts”: 0,
“persist_conflicts”: 0,
“memory_size”: 664198,
“memory_size_index”: 2412,
“allocated”: 1725478,
“freed”: 1061280,
“reclaimed”: 1061280,
“reclaim_pending”: 0,
“reclaim_list_size”: 1061280,
“reclaim_list_count”: 36,
“reclaim_threshold”: 11,
“allocated_index”: 2412,
“freed_index”: 0,
“reclaimed_index”: 0,
“num_pages”: 32,
“items_count”: 10000,
“num_rec_allocs”: 33413,
“num_rec_frees”: 19804,
“num_rec_swapout”: 0,
“num_rec_swapin”: 0,
“bytes_incoming”: 336669,
“bytes_written”: 1896448,
“write_amp”: 1.50,
“write_amp_avg”: 5.63,
“lss_fragmentation”: 91,
“lss_data_size”: 170649,
“lss_used_space”: 1896448,
“lss_num_reads”: 0,
“lss_read_bs”: 0,
“lss_blk_read_bs”: 0,
“lss_gc_num_reads”: 0,
“lss_gc_reads_bs”: 0,
“lss_blk_gc_reads_bs”: 0,
“lss_rdr_reads_bs”: 0,
“lss_blk_rdr_reads_bs”: 0,
“cache_hits”: 10002,
“cache_misses”: 0,
“cache_hit_ratio”: 1.00000,
“rlss_num_reads”: 0,
“rcache_hits”: 2,
“rcache_misses”: 0,
“rcache_hit_ratio”: 1.00000,
“resident_ratio”: 1.00000,
“mvcc_purge_ratio”: 1.36090,
“currSn”: 7,
“gcSn”: 5,
“gcSnIntervals”: “[0 5 6]”,
“purger_running”: false,
“mem_throttled”: false,
“lss_throttled”: false,
“lss_head_offset”: 0,
“lss_tail_offset”: 1896448,
“num_wctxs”: 22,
“num_readers”: 24,
“num_writers”: 8,
“buf_memused”: 421733,
“page_bytes”: 2140299,
“page_cnt”: 179,
“page_itemcnt”: 63452,
“avg_item_size”: 33,
“avg_page_size”: 11956
},
“BackStore”:
{
“memory_quota”: 483183820,
“punch_hole_support”: true,
“count”: 10000,
“compacts”: 312,
“purges”: 0,
“splits”: 55,
“merges”: 0,
“inserts”: 10000,
“deletes”: 0,
“compact_conflicts”: 295,
“split_conflicts”: 50,
“merge_conflicts”: 0,
“insert_conflicts”: 701,
“delete_conflicts”: 0,
“swapin_conflicts”: 0,
“persist_conflicts”: 0,
“memory_size”: 650528,
“memory_size_index”: 4612,
“allocated”: 4645708,
“freed”: 3995180,
“reclaimed”: 3995180,
“reclaim_pending”: 0,
“reclaim_list_size”: 1084547,
“reclaim_list_count”: 80,
“reclaim_threshold”: 25,
“allocated_index”: 4612,
“freed_index”: 0,
“reclaimed_index”: 0,
“num_pages”: 56,
“items_count”: 10000,
“num_rec_allocs”: 93754,
“num_rec_frees”: 80056,
“num_rec_swapout”: 0,
“num_rec_swapin”: 0,
“bytes_incoming”: 376669,
“bytes_written”: 1884160,
“write_amp”: 1.26,
“write_amp_avg”: 5.00,
“lss_fragmentation”: 90,
“lss_data_size”: 181080,
“lss_used_space”: 1884160,
“lss_num_reads”: 0,
“lss_read_bs”: 0,
“lss_blk_read_bs”: 0,
“lss_gc_num_reads”: 0,
“lss_gc_reads_bs”: 0,
“lss_blk_gc_reads_bs”: 0,
“lss_rdr_reads_bs”: 0,
“lss_blk_rdr_reads_bs”: 0,
“cache_hits”: 1038397,
“cache_misses”: 0,
“cache_hit_ratio”: 1.00000,
“rlss_num_reads”: 0,
“rcache_hits”: 1028397,
“rcache_misses”: 0,
“rcache_hit_ratio”: 1.00000,
“resident_ratio”: 1.00000,
“mvcc_purge_ratio”: 1.36980,
“currSn”: 7,
“gcSn”: 5,
“gcSnIntervals”: “[0 5 6]”,
“purger_running”: false,
“mem_throttled”: false,
“lss_throttled”: false,
“lss_head_offset”: 0,
“lss_tail_offset”: 1884160,
“num_wctxs”: 22,
“num_readers”: 0,
“num_writers”: 8,
“buf_memused”: 287342,
“page_bytes”: 6023974,
“page_cnt”: 712,
“page_itemcnt”: 159957,
“avg_item_size”: 37,
“avg_page_size”: 8460
}}

and here my sample script:

package main

import (
“fmt”
“gopkg.in/couchbase/gocb.v1”
“math/rand”
“time”
)

type data struct {
ID int
Value string
}

func main() {

rand.Seed(time.Now().UTC().UnixNano())

cluster, _ := gocb.Connect("couchbase://localhost")
bucket, _ := cluster.OpenBucket("test", "test")

for i := 0; i < 10000; i++ {
	key := fmt.Sprintf("key_%d", i)
	d := data{ID: i, Value: fmt.Sprintf("Somerandomvalue_%d", rand.Intn(10000000))}
	bucket.Upsert(key, &d, 0)
}

}

akhilmd · November 8, 2019, 9:08am

The logCleaner only runs if the used space is greater than 16MB. In the sample you provided, it is expected that the file size keeps increasing even though no new items are being inserted as the files reach only 3MB.

Also, index updates are processed as delete+insert. In plasma, deletes are ‘lazy’ and items are only marked as deleted by adding some metadata. Items are only cleaned up (during compaction and logCleaning) if a lot of stale data has accumulated.

gizmo74 · November 8, 2019, 9:49am

Thanks for your answer, I didn’t know the 16M limit. However, I have on the prod system some indices that needs > 2GB of disk storage while real index size is only a few MB. But i opened a enterprise support ticket for that now, seems a bit too complicated for a forum discussion. Thanks anyway for your help.

gizmo74 · December 6, 2019, 2:35pm

Enterpise support found the problem: we used NFS as data store. While NFS doesn’t support sparse files we ran into this problem. After moving the data directory from nfs to block device, disk use went down from 32G to < 2G.
So if you ever see a problem like this: trust the documentation, NFS is not a good idea for couchbase.

Topic		Replies	Views
Index files ./mainIndex/log.000000000000xx.data are filling my hard drive 100% SQL++ n1ql , index	14	3003	February 26, 2019
Couchbase 4.5 - increasing index storage Couchbase Server	2	633	March 28, 2019
Huge amount of used disk space taken by a single node Couchbase Server	4	628	February 24, 2021
Index 98 % fragmentation? (data 15 GB vs disk size 900 GB) Couchbase Server index	4	414	January 25, 2024
Index size on disk >> index data size Couchbase Server	3	663	May 22, 2023

Constantly growing plasma index files

Related topics