Constantly growing plasma index files

Hi,

We’re using Couchbase 6.0.3 Enterprise Edition and I see constantly growing
log.00000000000000.data files. After creating the index it’s only a few MB, but after a while it’s 1G or more.

I see Entries like this in indexer.log

2019-11-06T05:42:33.242+01:00 [Info] content/newestCat/Mainstore#9608566369239551069:0 Plasma: logCleaner: starting… frag 93, data: 1086342, used: 16954970 log:(755273830 - 772755456)
2019-11-06T05:42:33.245+01:00 [Info] content/newestCat/Mainstore#9608566369239551069:0 Plasma: logCleaner: completed… frag 93, data: 1086342, used: 16588370, relocated: 1111, retries: 0, skipped: 31
02 log:(755273830 - 772755456)

If I read this correctly, then logCleaner is starting with Frag 93 and Ending with Frag 93, so it is not compacting anything, right?

Thanks, Pascal

@gizmo74, from the logs that you have pasted, there is a slight drop in the used size, but it appears that the logCleaner is not able to compact the log file by much.

From the indexer stats, could you tell me the data_size, items_count and num_items_flushed for this index? It would be more useful if you could attach the entire indexer.log.

I would also suggest you to contact couchbase support for a more thorough analysis.

@akhilmd

Thanks for your answer. Here a simple, reproducible example. Of course I can open a enterprise ticket, but maybe this is also interesting for others, because it seems to be not limited to my production cluster. This example was tested on a single local couchbase instance running in docker. And I got a very similar result on the prod cluster.

  1. created a bucket “test”
  2. run sample script (see end of this post) once
  3. create index: CREATE INDEX foo on test (ID);

after creating index:
root@286d98ddb415:/opt/couchbase/var/lib/couchbase/data/@2i/test_foo_7086060725474839149_0.index/mainIndex# ls -l
total 512
-rwxr-x— 1 couchbase couchbase 36 Nov 7 07:50 config.json
-rwxr-x— 1 couchbase couchbase 8192 Nov 7 07:50 header.data
drwxr-x— 3 couchbase couchbase 4096 Nov 7 07:50 index
-rwxr-x— 1 couchbase couchbase 507904 Nov 7 07:50 log.00000000000000.data

2nd run
(10min later after dump on disk)
-rwxr-x— 1 couchbase couchbase 36 Nov 7 07:50 config.json
-rwxr-x— 1 couchbase couchbase 8192 Nov 7 07:59 header.data
drwxr-x— 3 couchbase couchbase 4096 Nov 7 07:50 index
-rwxr-x— 1 couchbase couchbase 614400 Nov 7 07:59 log.00000000000000.data

run 20 times
(10min later after dump on disk)
-rwxr-x— 1 couchbase couchbase 36 Nov 7 07:50 config.json
-rwxr-x— 1 couchbase couchbase 8192 Nov 7 08:09 header.data
drwxr-x— 3 couchbase couchbase 4096 Nov 7 08:09 index
-rwxr-x— 1 couchbase couchbase 868352 Nov 7 08:09 log.00000000000000.data

run another 80 times
(10min later after dump on disk)
-rwxr-x— 1 couchbase couchbase 36 Nov 7 07:50 config.json
-rwxr-x— 1 couchbase couchbase 8192 Nov 7 08:19 header.data
drwxr-x— 3 couchbase couchbase 4096 Nov 7 08:19 index
-rwxr-x— 1 couchbase couchbase 1196032 Nov 7 08:19 log.00000000000000.data

do nothing for 20 minutes…, still the same (makes sense, because no update)
-rwxr-x— 1 couchbase couchbase 36 Nov 7 07:50 config.json
-rwxr-x— 1 couchbase couchbase 8192 Nov 7 08:19 header.data
drwxr-x— 3 couchbase couchbase 4096 Nov 7 08:19 index
-rwxr-x— 1 couchbase couchbase 1196032 Nov 7 08:19 log.00000000000000.data

next… run job once every 10 minutes… 1 hour later:
-rwxr-x— 1 couchbase couchbase 36 Nov 7 07:50 config.json
-rwxr-x— 1 couchbase couchbase 8192 Nov 7 09:34 header.data
drwxr-x— 3 couchbase couchbase 4096 Nov 7 09:34 index
-rwxr-x— 1 couchbase couchbase 3227648 Nov 7 09:34 log.00000000000000.data

so this log.00000000000000.data was growing from 0.5M to 3.2 while only updating data, not a single new item was added. So it seems also in this simple use case the log is always growing, compaction/logcleaner seems not to work as it should. Or I’m doing something completly wrong here.

Here the output of stats:

{
“Index”: “test:foo”, “Id”: 7086060725474839149,
“Stats”:
{
“MainStore”:
{
“memory_quota”: 483183820,
“punch_hole_support”: true,
“count”: 10000,
“compacts”: 36,
“purges”: 0,
“splits”: 31,
“merges”: 0,
“inserts”: 10000,
“deletes”: 0,
“compact_conflicts”: 61,
“split_conflicts”: 51,
“merge_conflicts”: 0,
“insert_conflicts”: 672,
“delete_conflicts”: 0,
“swapin_conflicts”: 0,
“persist_conflicts”: 0,
“memory_size”: 664198,
“memory_size_index”: 2412,
“allocated”: 1725478,
“freed”: 1061280,
“reclaimed”: 1061280,
“reclaim_pending”: 0,
“reclaim_list_size”: 1061280,
“reclaim_list_count”: 36,
“reclaim_threshold”: 11,
“allocated_index”: 2412,
“freed_index”: 0,
“reclaimed_index”: 0,
“num_pages”: 32,
“items_count”: 10000,
“num_rec_allocs”: 33413,
“num_rec_frees”: 19804,
“num_rec_swapout”: 0,
“num_rec_swapin”: 0,
“bytes_incoming”: 336669,
“bytes_written”: 3227648,
“write_amp”: 1.50,
“write_amp_avg”: 9.59,
“lss_fragmentation”: 94,
“lss_data_size”: 170649,
“lss_used_space”: 3227648,
“lss_num_reads”: 0,
“lss_read_bs”: 0,
“lss_blk_read_bs”: 0,
“lss_gc_num_reads”: 0,
“lss_gc_reads_bs”: 0,
“lss_blk_gc_reads_bs”: 0,
“lss_rdr_reads_bs”: 0,
“lss_blk_rdr_reads_bs”: 0,
“cache_hits”: 10002,
“cache_misses”: 0,
“cache_hit_ratio”: 1.00000,
“rlss_num_reads”: 0,
“rcache_hits”: 2,
“rcache_misses”: 0,
“rcache_hit_ratio”: 1.00000,
“resident_ratio”: 1.00000,
“mvcc_purge_ratio”: 1.36090,
“currSn”: 11,
“gcSn”: 9,
“gcSnIntervals”: “[0 9 10]”,
“purger_running”: false,
“mem_throttled”: false,
“lss_throttled”: false,
“lss_head_offset”: 0,
“lss_tail_offset”: 3227648,
“num_wctxs”: 22,
“num_readers”: 24,
“num_writers”: 8,
“buf_memused”: 421733,
“page_bytes”: 2140299,
“page_cnt”: 179,
“page_itemcnt”: 63452,
“avg_item_size”: 33,
“avg_page_size”: 11956
},
“BackStore”:
{
“memory_quota”: 483183820,
“punch_hole_support”: true,
“count”: 10000,
“compacts”: 312,
“purges”: 0,
“splits”: 55,
“merges”: 0,
“inserts”: 10000,
“deletes”: 0,
“compact_conflicts”: 295,
“split_conflicts”: 50,
“merge_conflicts”: 0,
“insert_conflicts”: 701,
“delete_conflicts”: 0,
“swapin_conflicts”: 0,
“persist_conflicts”: 0,
“memory_size”: 650528,
“memory_size_index”: 4612,
“allocated”: 4645708,
“freed”: 3995180,
“reclaimed”: 3995180,
“reclaim_pending”: 0,
“reclaim_list_size”: 1084547,
“reclaim_list_count”: 80,
“reclaim_threshold”: 25,
“allocated_index”: 4612,
“freed_index”: 0,
“reclaimed_index”: 0,
“num_pages”: 56,
“items_count”: 10000,
“num_rec_allocs”: 93754,
“num_rec_frees”: 80056,
“num_rec_swapout”: 0,
“num_rec_swapin”: 0,
“bytes_incoming”: 376669,
“bytes_written”: 3309568,
“write_amp”: 1.26,
“write_amp_avg”: 8.79,
“lss_fragmentation”: 94,
“lss_data_size”: 181080,
“lss_used_space”: 3309568,
“lss_num_reads”: 0,
“lss_read_bs”: 0,
“lss_blk_read_bs”: 0,
“lss_gc_num_reads”: 0,
“lss_gc_reads_bs”: 0,
“lss_blk_gc_reads_bs”: 0,
“lss_rdr_reads_bs”: 0,
“lss_blk_rdr_reads_bs”: 0,
“cache_hits”: 1078397,
“cache_misses”: 0,
“cache_hit_ratio”: 1.00000,
“rlss_num_reads”: 0,
“rcache_hits”: 1068397,
“rcache_misses”: 0,
“rcache_hit_ratio”: 1.00000,
“resident_ratio”: 1.00000,
“mvcc_purge_ratio”: 1.36980,
“currSn”: 11,
“gcSn”: 9,
“gcSnIntervals”: “[0 9 10]”,
“purger_running”: false,
“mem_throttled”: false,
“lss_throttled”: false,
“lss_head_offset”: 0,
“lss_tail_offset”: 3309568,
“num_wctxs”: 22,
“num_readers”: 0,
“num_writers”: 8,
“buf_memused”: 287342,
“page_bytes”: 6023974,
“page_cnt”: 712,
“page_itemcnt”: 159957,
“avg_item_size”: 37,
“avg_page_size”: 8460
}}
}

{
“Index”: “test:foo”, “Id”: 7086060725474839149,
“Stats”:
{
“MainStore”:
{
“memory_quota”: 483183820,
“punch_hole_support”: true,
“count”: 10000,
“compacts”: 36,
“purges”: 0,
“splits”: 31,
“merges”: 0,
“inserts”: 10000,
“deletes”: 0,
“compact_conflicts”: 61,
“split_conflicts”: 51,
“merge_conflicts”: 0,
“insert_conflicts”: 672,
“delete_conflicts”: 0,
“swapin_conflicts”: 0,
“persist_conflicts”: 0,
“memory_size”: 664198,
“memory_size_index”: 2412,
“allocated”: 1725478,
“freed”: 1061280,
“reclaimed”: 1061280,
“reclaim_pending”: 0,
“reclaim_list_size”: 1061280,
“reclaim_list_count”: 36,
“reclaim_threshold”: 11,
“allocated_index”: 2412,
“freed_index”: 0,
“reclaimed_index”: 0,
“num_pages”: 32,
“items_count”: 10000,
“num_rec_allocs”: 33413,
“num_rec_frees”: 19804,
“num_rec_swapout”: 0,
“num_rec_swapin”: 0,
“bytes_incoming”: 336669,
“bytes_written”: 1896448,
“write_amp”: 1.50,
“write_amp_avg”: 5.63,
“lss_fragmentation”: 91,
“lss_data_size”: 170649,
“lss_used_space”: 1896448,
“lss_num_reads”: 0,
“lss_read_bs”: 0,
“lss_blk_read_bs”: 0,
“lss_gc_num_reads”: 0,
“lss_gc_reads_bs”: 0,
“lss_blk_gc_reads_bs”: 0,
“lss_rdr_reads_bs”: 0,
“lss_blk_rdr_reads_bs”: 0,
“cache_hits”: 10002,
“cache_misses”: 0,
“cache_hit_ratio”: 1.00000,
“rlss_num_reads”: 0,
“rcache_hits”: 2,
“rcache_misses”: 0,
“rcache_hit_ratio”: 1.00000,
“resident_ratio”: 1.00000,
“mvcc_purge_ratio”: 1.36090,
“currSn”: 7,
“gcSn”: 5,
“gcSnIntervals”: “[0 5 6]”,
“purger_running”: false,
“mem_throttled”: false,
“lss_throttled”: false,
“lss_head_offset”: 0,
“lss_tail_offset”: 1896448,
“num_wctxs”: 22,
“num_readers”: 24,
“num_writers”: 8,
“buf_memused”: 421733,
“page_bytes”: 2140299,
“page_cnt”: 179,
“page_itemcnt”: 63452,
“avg_item_size”: 33,
“avg_page_size”: 11956
},
“BackStore”:
{
“memory_quota”: 483183820,
“punch_hole_support”: true,
“count”: 10000,
“compacts”: 312,
“purges”: 0,
“splits”: 55,
“merges”: 0,
“inserts”: 10000,
“deletes”: 0,
“compact_conflicts”: 295,
“split_conflicts”: 50,
“merge_conflicts”: 0,
“insert_conflicts”: 701,
“delete_conflicts”: 0,
“swapin_conflicts”: 0,
“persist_conflicts”: 0,
“memory_size”: 650528,
“memory_size_index”: 4612,
“allocated”: 4645708,
“freed”: 3995180,
“reclaimed”: 3995180,
“reclaim_pending”: 0,
“reclaim_list_size”: 1084547,
“reclaim_list_count”: 80,
“reclaim_threshold”: 25,
“allocated_index”: 4612,
“freed_index”: 0,
“reclaimed_index”: 0,
“num_pages”: 56,
“items_count”: 10000,
“num_rec_allocs”: 93754,
“num_rec_frees”: 80056,
“num_rec_swapout”: 0,
“num_rec_swapin”: 0,
“bytes_incoming”: 376669,
“bytes_written”: 1884160,
“write_amp”: 1.26,
“write_amp_avg”: 5.00,
“lss_fragmentation”: 90,
“lss_data_size”: 181080,
“lss_used_space”: 1884160,
“lss_num_reads”: 0,
“lss_read_bs”: 0,
“lss_blk_read_bs”: 0,
“lss_gc_num_reads”: 0,
“lss_gc_reads_bs”: 0,
“lss_blk_gc_reads_bs”: 0,
“lss_rdr_reads_bs”: 0,
“lss_blk_rdr_reads_bs”: 0,
“cache_hits”: 1038397,
“cache_misses”: 0,
“cache_hit_ratio”: 1.00000,
“rlss_num_reads”: 0,
“rcache_hits”: 1028397,
“rcache_misses”: 0,
“rcache_hit_ratio”: 1.00000,
“resident_ratio”: 1.00000,
“mvcc_purge_ratio”: 1.36980,
“currSn”: 7,
“gcSn”: 5,
“gcSnIntervals”: “[0 5 6]”,
“purger_running”: false,
“mem_throttled”: false,
“lss_throttled”: false,
“lss_head_offset”: 0,
“lss_tail_offset”: 1884160,
“num_wctxs”: 22,
“num_readers”: 0,
“num_writers”: 8,
“buf_memused”: 287342,
“page_bytes”: 6023974,
“page_cnt”: 712,
“page_itemcnt”: 159957,
“avg_item_size”: 37,
“avg_page_size”: 8460
}}

and here my sample script:

package main

import (
“fmt”
“gopkg.in/couchbase/gocb.v1”
“math/rand”
“time”
)

type data struct {
ID int
Value string
}

func main() {

rand.Seed(time.Now().UTC().UnixNano())

cluster, _ := gocb.Connect("couchbase://localhost")
bucket, _ := cluster.OpenBucket("test", "test")

for i := 0; i < 10000; i++ {
	key := fmt.Sprintf("key_%d", i)
	d := data{ID: i, Value: fmt.Sprintf("Somerandomvalue_%d", rand.Intn(10000000))}
	bucket.Upsert(key, &d, 0)
}

}

The logCleaner only runs if the used space is greater than 16MB. In the sample you provided, it is expected that the file size keeps increasing even though no new items are being inserted as the files reach only 3MB.

Also, index updates are processed as delete+insert. In plasma, deletes are ‘lazy’ and items are only marked as deleted by adding some metadata. Items are only cleaned up (during compaction and logCleaning) if a lot of stale data has accumulated.

Thanks for your answer, I didn’t know the 16M limit. However, I have on the prod system some indices that needs > 2GB of disk storage while real index size is only a few MB. But i opened a enterprise support ticket for that now, seems a bit too complicated for a forum discussion. Thanks anyway for your help.

Enterpise support found the problem: we used NFS as data store. While NFS doesn’t support sparse files we ran into this problem. After moving the data directory from nfs to block device, disk use went down from 32G to < 2G.
So if you ever see a problem like this: trust the documentation, NFS is not a good idea for couchbase. :slight_smile: