ep_kv_size != vb_active_itm_memory + vb_replica_itm_memory + vb_pending_itm_memory
hello,
we have 2 nodes in a cluster with replica enabled.
mem_used increased and items began to be ejected.
In order to limit disk IO, we change the value of ep_mem_low_wat and ep_mem_high_wat.
The problem is that vb_active_itm_memory decreases but mem_used and ep_kv_size didn't decrease enough.
Do you have any idea of the problem ?
Server 1:
ep_kv_size: 20336694480
ep_max_data_size: 60188262400
ep_mem_high_wat: 55000000000
ep_mem_low_wat: 52000000000
mem_used: 20659513297
vb_active_itm_memory: 9788261510
vb_pending_itm_memory: 0
vb_replica_itm_memory: 333461743
Server 2:
ep_kv_size: 53641447832
ep_max_data_size: 60188262400
ep_mem_high_wat: 55000000000
ep_mem_low_wat: 52000000000
mem_used: 54169448340
vb_active_itm_memory: 7811492591
vb_pending_itm_memory: 0
vb_replica_itm_memory: 2631742302
Full mbstats :
Server1 :
accepting_conns: 1
auth_cmds: 18216
auth_errors: 14
bucket_active_conns: 1
bucket_conns: 133
bytes_read: 15914128439596
bytes_written: 3985604346954
cas_badval: 5
cas_hits: 4
cas_misses: 0
cmd_flush: 9
cmd_get: 202544201
cmd_set: 40190937
conn_yields: 869641
connection_structures: 280
curr_connections: 235
curr_items: 735746
curr_items_tot: 995795
daemon_connections: 10
decr_hits: 10
decr_misses: 0
delete_hits: 157919
delete_misses: 631644
ep_bg_fetched: 92706
ep_bg_load: 23504536579
ep_bg_load_avg: 253538
ep_bg_max_load: 33715951
ep_bg_max_wait: 67624281
ep_bg_min_load: 11
ep_bg_min_wait: 17
ep_bg_num_samples: 92706
ep_bg_wait: 238820898105
ep_bg_wait_avg: 2576110
ep_commit_num: 96009
ep_commit_time: 91
ep_commit_time_total: 2495883
ep_data_age: 21679
ep_data_age_highwat: 201764
ep_db_cleaner_status: complete
ep_db_strategy: multiMTVBDB
ep_dbinit: 1
ep_dbname: /opt/membase/var/lib/membase/data/default-data/default
ep_dbshards: 4
ep_diskqueue_drain: 1923821
ep_diskqueue_fill: 3216866
ep_diskqueue_items: 1306029
ep_diskqueue_memory: 114930552
ep_diskqueue_pending: 8875522673
ep_expired: 6831961345
ep_flush_all: false
ep_flush_duration: 67633
ep_flush_duration_highwat: 201742
ep_flush_duration_total: 2822221
ep_flush_preempts: 0
ep_flusher_state: running
ep_flusher_todo: 918032
ep_io_num_read: 92706
ep_io_num_write: 39621407
ep_io_read_bytes: 1174624958
ep_io_write_bytes: 1076262004165
ep_item_begin_failed: 0
ep_item_commit_failed: 0
ep_item_flush_expired: 34415938
ep_item_flush_failed: 0
ep_items_rm_from_checkpoints: 107389430
ep_kv_size: 20336694480
ep_latency_arith_cmd: 30
ep_latency_get_cmd: 202640003
ep_latency_store_cmd: 40190981
ep_max_data_size: 60188262400
ep_max_txn_size: 1000
ep_mem_high_wat: 55000000000
ep_mem_low_wat: 52000000000
ep_min_data_age: 0
ep_num_active_non_resident: 0
ep_num_checkpoint_remover_runs: 573219
ep_num_eject_failures: 26177
ep_num_eject_replicas: 331521
ep_num_expiry_pager_runs: 796
ep_num_non_resident: 67845
ep_num_not_my_vbuckets: 3957
ep_num_pager_runs: 11444
ep_num_value_ejects: 842679
ep_onlineupdate: false
ep_onlineupdate_revert_add: 0
ep_onlineupdate_revert_delete: 0
ep_onlineupdate_revert_update: 0
ep_oom_errors: 0
ep_overhead: 322818817
ep_pending_ops: 0
ep_pending_ops_max: 0
ep_pending_ops_max_duration: 0
ep_pending_ops_total: 0
ep_queue_age_cap: 900
ep_queue_size: 388090
ep_storage_age: 21679
ep_storage_age_highwat: 201764
ep_storage_type: featured
ep_store_max_concurrency: 10
ep_store_max_readers: 9
ep_store_max_readwrite: 1
ep_tap_bg_fetch_requeued: 0
ep_tap_bg_fetched: 0
ep_tap_keepalive: 300
ep_tmp_oom_errors: 624159
ep_too_old: 37120299
ep_too_young: 0
ep_total_cache_size: 99510786981
ep_total_del_items: 27175531
ep_total_enqueued: 107292095
ep_total_new_items: 29440402
ep_total_persisted: 66796597
ep_uncommitted_items: 631
ep_value_size: 20154758771
ep_vb_total: 1024
ep_vbucket_del: 0
ep_vbucket_del_fail: 0
ep_version: 1.7.0_56_g8d17d20
ep_warmed_up: 0
ep_warmup: true
ep_warmup_dups: 0
ep_warmup_oom: 0
ep_warmup_thread: complete
ep_warmup_time: 12362
get_hits: 165412320
get_misses: 37131881
incr_hits: 20
incr_misses: 0
libevent: 2.0.11-stable
limit_maxbytes: 67108864
listen_disabled_num: 0
mem_used: 20659513297
pid: 14805
pointer_size: 64
rejected_conns: 0
rusage_system: 403203.018613
rusage_user: 200770.447368
tap_checkpoint_end_received: 4563041
tap_checkpoint_end_sent: 4835274
tap_checkpoint_start_received: 4576623
tap_checkpoint_start_sent: 4845443
tap_connect_received: 1033
tap_delete_received: 23538981
tap_delete_sent: 25940692
tap_flush_sent: 17
tap_mutation_received: 388662039
tap_mutation_sent: 137520654
tap_opaque_received: 22
tap_opaque_sent: 2066
tap_vbucket_set_sent: 2048
threads: 4
time: 1326833114
total_connections: 43140
uptime: 2866224
vb_active_curr_items: 735746
vb_active_eject: 511158
vb_active_ht_memory: 12836864
vb_active_itm_memory: 9788261510
vb_active_num: 512
vb_active_num_non_resident: 0
vb_active_ops_create: 448081
vb_active_ops_delete: 20266
vb_active_ops_reject: 0
vb_active_ops_update: 15789
vb_active_perc_mem_resident: 100
vb_active_queue_age: 20041103685000
vb_active_queue_drain: 1334450
vb_active_queue_fill: 2127889
vb_active_queue_memory: 69822632
vb_active_queue_pending: 6119212491
vb_active_queue_size: 793439
vb_dead_num: 0
vb_pending_curr_items: 0
vb_pending_eject: 0
vb_pending_ht_memory: 0
vb_pending_itm_memory: 0
vb_pending_num: 0
vb_pending_num_non_resident: 0
vb_pending_ops_create: 0
vb_pending_ops_delete: 0
vb_pending_ops_reject: 0
vb_pending_ops_update: 0
vb_pending_perc_mem_resident: 0
vb_pending_queue_age: 0
vb_pending_queue_drain: 0
vb_pending_queue_fill: 0
vb_pending_queue_memory: 0
vb_pending_queue_pending: 0
vb_pending_queue_size: 0
vb_replica_curr_items: 260049
vb_replica_eject: 331521
vb_replica_ht_memory: 12836864
vb_replica_itm_memory: 333461743
vb_replica_num: 512
vb_replica_num_non_resident: 67845
vb_replica_ops_create: 136889
vb_replica_ops_delete: 131
vb_replica_ops_reject: 0
vb_replica_ops_update: 104
vb_replica_perc_mem_resident: 73
vb_replica_queue_age: 17024677637000
vb_replica_queue_drain: 589371
vb_replica_queue_fill: 1088977
vb_replica_queue_memory: 45107920
vb_replica_queue_pending: 2756310182
vb_replica_queue_size: 512590
version: 1.4.4_461_gf99c147
Server2:
accepting_conns: 1
auth_cmds: 68933
auth_errors: 0
bucket_active_conns: 1
bucket_conns: 123
bytes_read: 6683777035162
bytes_written: 3944385500761
cas_badval: 1
cas_hits: 1
cas_misses: 0
cmd_flush: 9
cmd_get: 186240856
cmd_set: 35573333
conn_yields: 759146
connection_structures: 635
curr_connections: 235
curr_items: 704041
curr_items_tot: 1322787
daemon_connections: 10
decr_hits: 32
decr_misses: 0
delete_hits: 164155
delete_misses: 649755
ep_bg_fetched: 438971
ep_bg_load: 83877283633
ep_bg_load_avg: 191077
ep_bg_max_load: 23227772
ep_bg_max_wait: 80925225
ep_bg_min_load: 7
ep_bg_min_wait: 12
ep_bg_num_samples: 438971
ep_bg_wait: 1156308009837
ep_bg_wait_avg: 2634133
ep_commit_num: 165122
ep_commit_time: 11
ep_commit_time_total: 2526251
ep_data_age: 17068
ep_data_age_highwat: 99350
ep_db_cleaner_status: complete
ep_db_strategy: multiMTVBDB
ep_dbinit: 1
ep_dbname: /opt/membase/var/lib/membase/data/default-data/default
ep_dbshards: 4
ep_diskqueue_drain: 3595730
ep_diskqueue_fill: 3827625
ep_diskqueue_items: 1029192
ep_diskqueue_memory: 90568896
ep_diskqueue_pending: 9983975911
ep_expired: 2360197242
ep_flush_all: false
ep_flush_duration: 29199
ep_flush_duration_highwat: 58380
ep_flush_duration_total: 2768105
ep_flush_preempts: 0
ep_flusher_state: running
ep_flusher_todo: 565510
ep_io_num_read: 438971
ep_io_num_write: 46477831
ep_io_read_bytes: 5596563515
ep_io_write_bytes: 1482818728017
ep_item_begin_failed: 0
ep_item_commit_failed: 0
ep_item_flush_expired: 39146825
ep_item_flush_failed: 0
ep_items_rm_from_checkpoints: 111845763
ep_kv_size: 53641447832
ep_latency_arith_cmd: 52
ep_latency_get_cmd: 186679597
ep_latency_store_cmd: 35573387
ep_max_data_size: 60188262400
ep_max_txn_size: 1000
ep_mem_high_wat: 55000000000
ep_mem_low_wat: 52000000000
ep_min_data_age: 0
ep_num_active_non_resident: 195229
ep_num_checkpoint_remover_runs: 571980
ep_num_eject_failures: 59713
ep_num_eject_replicas: 2255313
ep_num_expiry_pager_runs: 794
ep_num_non_resident: 623033
ep_num_not_my_vbuckets: 0
ep_num_pager_runs: 9413
ep_num_value_ejects: 3736714
ep_onlineupdate: false
ep_onlineupdate_revert_add: 0
ep_onlineupdate_revert_delete: 0
ep_onlineupdate_revert_update: 0
ep_oom_errors: 0
ep_overhead: 528000508
ep_pending_ops: 0
ep_pending_ops_max: 0
ep_pending_ops_max_duration: 0
ep_pending_ops_total: 0
ep_queue_age_cap: 900
ep_queue_size: 476462
ep_storage_age: 17068
ep_storage_age_highwat: 99350
ep_storage_type: featured
ep_store_max_concurrency: 10
ep_store_max_readers: 9
ep_store_max_readwrite: 1
ep_tap_bg_fetch_requeued: 0
ep_tap_bg_fetched: 0
ep_tap_keepalive: 300
ep_tmp_oom_errors: 205882
ep_too_old: 41966459
ep_too_young: 0
ep_total_cache_size: 135260814877
ep_total_del_items: 31887022
ep_total_enqueued: 110441500
ep_total_new_items: 35665787
ep_total_persisted: 78360518
ep_uncommitted_items: 288
ep_value_size: 53400952337
ep_vb_total: 1024
ep_vbucket_del: 0
ep_vbucket_del_fail: 0
ep_version: 1.7.0_56_g8d17d20
ep_warmed_up: 0
ep_warmup: true
ep_warmup_dups: 0
ep_warmup_oom: 0
ep_warmup_thread: complete
ep_warmup_time: 12841
get_hits: 151019170
get_misses: 35221686
incr_hits: 17
incr_misses: 3
libevent: 2.0.11-stable
limit_maxbytes: 67108864
listen_disabled_num: 0
mem_used: 54169448340
pid: 2876
pointer_size: 64
rejected_conns: 0
rusage_system: 186242.315417
rusage_user: 164383.813346
tap_checkpoint_end_received: 4830300
tap_checkpoint_end_sent: 4558060
tap_checkpoint_start_received: 4840469
tap_checkpoint_start_sent: 4571642
tap_connect_received: 8
tap_delete_received: 25940670
tap_delete_sent: 23538951
tap_flush_received: 17
tap_flush_sent: 26
tap_mutation_received: 132777732
tap_mutation_sent: 384369463
tap_opaque_received: 2073
tap_opaque_sent: 16
tap_vbucket_set_received: 2048
threads: 4
time: 1326827312
total_connections: 93754
uptime: 2860070
vb_active_curr_items: 704041
vb_active_eject: 1481401
vb_active_ht_memory: 12836864
vb_active_itm_memory: 7811492591
vb_active_num: 512
vb_active_num_non_resident: 195229
vb_active_ops_create: 560502
vb_active_ops_delete: 16976
vb_active_ops_reject: 0
vb_active_ops_update: 47508
vb_active_perc_mem_resident: 72
vb_active_queue_age: 7939953381000
vb_active_queue_drain: 1919634
vb_active_queue_fill: 2020325
vb_active_queue_memory: 53258568
vb_active_queue_pending: 4333745647
vb_active_queue_size: 605211
vb_dead_num: 0
vb_pending_curr_items: 0
vb_pending_eject: 0
vb_pending_ht_memory: 0
vb_pending_itm_memory: 0
vb_pending_num: 0
vb_pending_num_non_resident: 0
vb_pending_ops_create: 0
vb_pending_ops_delete: 0
vb_pending_ops_reject: 0
vb_pending_ops_update: 0
vb_pending_perc_mem_resident: 0
vb_pending_queue_age: 0
vb_pending_queue_drain: 0
vb_pending_queue_fill: 0
vb_pending_queue_memory: 0
vb_pending_queue_pending: 0
vb_pending_queue_size: 0
vb_replica_curr_items: 618746
vb_replica_eject: 2255313
vb_replica_ht_memory: 12836864
vb_replica_itm_memory: 2631742302
vb_replica_num: 512
vb_replica_num_non_resident: 427804
vb_replica_ops_create: 615911
vb_replica_ops_delete: 123587
vb_replica_ops_reject: 0
vb_replica_ops_update: 74634
vb_replica_perc_mem_resident: 30
vb_replica_queue_age: 5495160029000
vb_replica_queue_drain: 1676096
vb_replica_queue_fill: 1807300
vb_replica_queue_memory: 37310328
vb_replica_queue_pending: 5650230264
vb_replica_queue_size: 423981
version: 1.4.4_461_gf99c147
hello
we've upgraded to couchbase 1.8. The first problem about RAM used seem to be solved.
But we always have the problem with the diskqueue. The drain rate isn't enough and diskqueue increase. It will crash again.
I've tried to vacuum but it don't solved and now, the drain rate is NULL.
/opt/couchbase/bin/sqlite3 default-1.mb
SQLite version 3.7.2
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> vacuum;
Do you have any idea ? It's critical, if we don't solve it, we'll change our cache system.
/opt/couchbase/bin/cbstats localhost:11210 all
accepting_conns: 1
auth_cmds: 639
auth_errors: 0
bucket_active_conns: 1
bucket_conns: 124
bytes_read: 220423702882
bytes_written: 346980308189
cas_badval: 0
cas_hits: 0
cas_misses: 0
cmd_flush: 0
cmd_get: 20895800
cmd_set: 4218219
conn_yields: 87949
connection_structures: 382
curr_connections: 200
curr_items: 544664
curr_items_tot: 1104237
daemon_connections: 10
decr_hits: 0
decr_misses: 0
delete_hits: 19159
delete_misses: 46094
ep_bg_fetched: 0
ep_commit_num: 42884
ep_commit_time: 14
ep_commit_time_total: 261892
ep_data_age: 967
ep_data_age_highwat: 4169
ep_db_cleaner_status: complete
ep_db_strategy: multiMTVBDB
ep_dbinit: 1
ep_dbname: /opt/couchbase/var/lib/couchbase/data/default-data/default
ep_dbshards: 4
ep_diskqueue_drain: 11795853
ep_diskqueue_fill: 11745166
ep_diskqueue_items: 101594
ep_diskqueue_memory: 8127520
ep_diskqueue_pending: 1647962431
ep_exp_pager_stime: 3600
ep_expired: 4605186
ep_flush_all: false
ep_flush_duration: 2395
ep_flush_duration_highwat: 2395
ep_flush_duration_total: 265095
ep_flush_preempts: 0
ep_flusher_state: running
ep_flusher_todo: 78312
ep_inconsistent_slave_chk: 0
ep_io_num_read: 0
ep_io_num_write: 7105991
ep_io_read_bytes: 0
ep_io_write_bytes: 212157287772
ep_item_begin_failed: 31311
ep_item_commit_failed: 0
ep_item_flush_expired: 4392927
ep_item_flush_failed: 244
ep_items_rm_from_checkpoints: 12145520
ep_keep_closed_checkpoints: 0
ep_kv_size: 13539775241
ep_latency_arith_cmd: 0
ep_latency_get_cmd: 20895800
ep_latency_store_cmd: 4218219
ep_max_data_size: 58195968000
ep_max_txn_size: 1000
ep_mem_high_wat: 43646976000
ep_mem_low_wat: 34917580800
ep_min_data_age: 0
ep_num_active_non_resident: 0
ep_num_checkpoint_remover_runs: 63195
ep_num_eject_failures: 0
ep_num_eject_replicas: 0
ep_num_expiry_pager_runs: 87
ep_num_non_resident: 0
ep_num_not_my_vbuckets: 0
ep_num_pager_runs: 0
ep_num_value_ejects: 0
ep_onlineupdate: false
ep_onlineupdate_revert_add: 0
ep_onlineupdate_revert_delete: 0
ep_onlineupdate_revert_update: 0
ep_oom_errors: 0
ep_overhead: 69958048
ep_pending_ops: 0
ep_pending_ops_max: 0
ep_pending_ops_max_duration: 0
ep_pending_ops_total: 0
ep_queue_age_cap: 900
ep_queue_size: 88239
ep_storage_age: 1694
ep_storage_age_highwat: 4988
ep_storage_type: featured
ep_store_max_concurrency: 10
ep_store_max_readers: 9
ep_store_max_readwrite: 1
ep_tap_bg_fetch_requeued: 0
ep_tap_bg_fetched: 0
ep_tap_keepalive: 300
ep_tmp_oom_errors: 0
ep_too_old: 338866
ep_too_young: 0
ep_total_cache_size: 206477473771
ep_total_del_items: 4419139
ep_total_enqueued: 11745166
ep_total_new_items: 5462909
ep_total_persisted: 11524796
ep_uncommitted_items: 0
ep_value_size: 13359046413
ep_vb_total: 1024
ep_vbucket_del: 0
ep_vbucket_del_fail: 0
ep_version: 1.8.0r_78_g3539559
ep_warmed_up: 0
ep_warmup: true
ep_warmup_dups: 0
ep_warmup_oom: 0
ep_warmup_thread: complete
ep_warmup_time: 13097
get_hits: 17677170
get_misses: 3218630
incr_hits: 0
incr_misses: 0
libevent: 2.0.11-stable
limit_maxbytes: 67108864
listen_disabled_num: 0
mem_used: 13609733289
pid: 27293
pointer_size: 64
rejected_conns: 0
rusage_system: 16651.768670
rusage_user: 24020.357178
tap_checkpoint_end_received: 534571
tap_checkpoint_end_sent: 525805
tap_checkpoint_start_received: 535595
tap_checkpoint_start_sent: 527853
tap_connect_received: 1027
tap_delete_received: 2431447
tap_delete_sent: 2435715
tap_mutation_received: 7729416
tap_mutation_sent: 7630860
tap_opaque_received: 10
tap_opaque_sent: 2054
tap_vbucket_set_sent: 2048
threads: 4
time: 1328883713
total_connections: 6497
uptime: 316091
vb_active_curr_items: 544664
vb_active_eject: 0
vb_active_ht_memory: 12836864
vb_active_itm_memory: 6770127024
vb_active_num: 512
vb_active_num_non_resident: 0
vb_active_ops_create: 2742109
vb_active_ops_delete: 2223028
vb_active_ops_reject: 0
vb_active_ops_update: 850707
vb_active_perc_mem_resident: 100
vb_active_queue_age: 96440318000
vb_active_queue_drain: 5839195
vb_active_queue_fill: 5886124
vb_active_queue_memory: 3754320
vb_active_queue_pending: 774769472
vb_active_queue_size: 46929
vb_dead_num: 0
vb_pending_curr_items: 0
vb_pending_eject: 0
vb_pending_ht_memory: 0
vb_pending_itm_memory: 0
vb_pending_num: 0
vb_pending_num_non_resident: 0
vb_pending_ops_create: 0
vb_pending_ops_delete: 0
vb_pending_ops_reject: 0
vb_pending_ops_update: 0
vb_pending_perc_mem_resident: 0
vb_pending_queue_age: 0
vb_pending_queue_drain: 0
vb_pending_queue_fill: 0
vb_pending_queue_memory: 0
vb_pending_queue_pending: 0
vb_pending_queue_size: 0
vb_replica_curr_items: 559573
vb_replica_eject: 0
vb_replica_ht_memory: 12836864
vb_replica_itm_memory: 6665159131
vb_replica_num: 512
vb_replica_num_non_resident: 0
vb_replica_ops_create: 2720901
vb_replica_ops_delete: 2197060
vb_replica_ops_reject: 0
vb_replica_ops_update: 792274
vb_replica_perc_mem_resident: 100
vb_replica_queue_age: 18446744065870592616
vb_replica_queue_drain: 5956658
vb_replica_queue_fill: 5859042
vb_replica_queue_memory: 4373200
vb_replica_queue_pending: 873192959
vb_replica_queue_size: 54665
version: UNKNOWN
Having the same issue with drain rate. Here is screenshot: http://dl.dropbox.com/u/6308768/mb.jpg
hello
I think the problem come from ep_flusher_todo which is very high and don't decrease.
When we do a bucket-flush command, the problem came again in about one day.
Is usefull to decrease queue_age_cap with /opt/membase/bin/mbflushctl ?
Thanks