we have a 4 node cluster, with 24 GB RAM, out of which 18GB has been given to couchbase with zero replicaion. We have a approx 10M of records in this cluster with ~2.5M/hour and expire old items. My RAM Usage which is ~72GB is getting full every ~12 days, and i need to restart the cluster to fix this. After restart again the RAM usage is back to ~20GB.
Can someone please help to understand the reason for it.
FYI : Auto Compaction is set to 40% fragment level and Meta Data Purge Interval is set to 1 Day, – which we reduced to do 2 hours. But it didn’t help.
What do you mean by “full”? Couchbase by design will attempt to use as much of the Server Quota as you allocate to it for caching recently-accessed data. You mention you have set an 18GB Server Quota, and 4x that is indeed 72GB.
How much of the Server Quota have you allocated to Buckets? If you’ve allocated all of it, then Couchbase using 72GB to cache data is exactly expected.
Note also that there will be some additional usage over the Server Quota for general cluster management, but 6GB (24GB-18GB) is probably sufficient in most cases.
We have created a bucket of 70GB out of 72GB RAM. So here “full” means all 70GB of memory is getting full. While the disk data size remains the ~22 GB for the same cluster.
Currently we have sufficeint free mem avaliable — as we restarted the node 4 days back…
free -m
total used free shared buffers cached
Mem: 23934 15908 8026 0 227 4282
-/+ buffers/cache: 11398 12536
Swap: 4095 12 4083
ps -o rss,vsz 19419
RSS VSZ
10466268 12859580
the main issue is the difference between the two metrics of couchbase, and we don’t have any replica.
vb_active_itm_memory --> 3G
ep_kv_size —> 10.5G
In few days, this ep_kv_size will reach to 15 G, and then node will die…
MALLOC: 11324902704 (10800.3 MiB) Bytes in use by application
MALLOC: + 1119019008 ( 1067.2 MiB) Bytes in page heap freelist
MALLOC: + 292244776 ( 278.7 MiB) Bytes in central cache freelist
MALLOC: + 8535392 ( 8.1 MiB) Bytes in transfer cache freelist
MALLOC: + 28149832 ( 26.8 MiB) Bytes in thread cache freelists
MALLOC: + 29814944 ( 28.4 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 12802666656 (12209.6 MiB) Actual memory used (physical + swap)
MALLOC: + 60792832 ( 58.0 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 12863459488 (12267.6 MiB) Virtual address space used
MALLOC:
MALLOC: 234654 Spans in use
MALLOC: 16 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
Total size of freelists for per-thread caches,
transfer cache, and central cache, by size class
class 1 [ 8 bytes ] : 13090 objs; 0.1 MiB; 0.1 cum MiB
class 2 [ 16 bytes ] : 44695 objs; 0.7 MiB; 0.8 cum MiB
class 3 [ 32 bytes ] : 219591 objs; 6.7 MiB; 7.5 cum MiB
class 4 [ 48 bytes ] : 12820 objs; 0.6 MiB; 8.1 cum MiB
class 5 [ 64 bytes ] : 18694 objs; 1.1 MiB; 9.2 cum MiB
class 6 [ 80 bytes ] : 119907 objs; 9.1 MiB; 18.4 cum MiB
class 7 [ 96 bytes ] : 93802 objs; 8.6 MiB; 26.9 cum MiB
class 8 [ 112 bytes ] : 65668 objs; 7.0 MiB; 34.0 cum MiB
class 9 [ 128 bytes ] : 171914 objs; 21.0 MiB; 54.9 cum MiB
class 10 [ 144 bytes ] : 119318 objs; 16.4 MiB; 71.3 cum MiB
…
MALLOC: 11324902704 (10800.3 MiB) Bytes in use by application
MALLOC: + 1119019008 ( 1067.2 MiB) Bytes in page heap freelist
MALLOC: + 292244776 ( 278.7 MiB) Bytes in central cache freelist
MALLOC: + 8535392 ( 8.1 MiB) Bytes in transfer cache freelist
MALLOC: + 28149832 ( 26.8 MiB) Bytes in thread cache freelists
MALLOC: + 29814944 ( 28.4 MiB) Bytes in malloc metadata
[/quote]
Ok, so this means that Couchbase's kv-engine ( `memcached`) on that one node has requested 10.54GB, and adding on memory allocator overhead it's actually using 11.97GB of RAM. That's not an uncommon / unusual amount of overhead with TCMalloc.
I assume these are from when your cluster is "good" - can you re-post when you're actually seeing the unexpectedly high usage.
Additionally you might want to start testing 4.0 RC - this has a different memory allocator (jemalloc) and defragmenter (on Linux) which should reduce the memory allocator overheads.