Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.7.x

Default bucket has "Hard Out of Memory Error"

10 replies [Last post]
  • Login or register to post comments
Fri, 07/15/2011 - 09:16
jasaid
Offline
Joined: 06/22/2011
Groups: None

We are experiencing the following error on our live membase servers:

Hard Out Of Memory Error. Bucket "default" on node "node.two.co.uk" is full. All memory allocated to this bucket is used for metadata.

We resolved this issue by increasing the memory allocated to the bucket. But we are not expecting this error. Below is the configuration of our membase cache.

No of Nodes in Cluster: 2
No of buckets in total: 6
Bucket with error: default
Bucket using replication: Yes
Bucket per node memory allocation: 512MB
The membase bucket monitor page on the web console reported the bucket to be 53% full (http://node.two.co.uk:8091/index.html#sec=monitor_buckets)

We would like to know considering the bucket was only 53% full why we were getting this error?

Top
  • Login or register to post comments
Fri, 07/15/2011 - 11:26
perry
Offline
Joined: 10/11/2010
Groups:

I'd need a bit more data to give a complete answer, but that error is generated when Membase is unable to allocate anymore RAM to take in new items. The 53% used does seem a bit odd, can you show me a screenshot of the monitoring page?

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Mon, 07/18/2011 - 04:11
slath@citywire.co.uk
Offline
Joined: 06/27/2011
Groups: None

Hi Perry,

Here is a link to a screenshot of the page on which the default bucket was reporting ~53% RAM used when we were getting "hard out of memory error". We allocated more RAM to the bucket to resolve this issue. If this occurs again is there anything we can do that will help diagnose this issue?

Also could you clarify if this is actually just a warning or an error as the message suggests? Do the new items coming in not get written at all? do we completely lose the new item or are they just persisted. does the enyim client get a response that describes what is going on or is it an unhandled exception?

Note: there was replication enabled on the bucket and all items going into that bucket were set to expire after 15min.

Top
  • Login or register to post comments
Mon, 07/18/2011 - 11:15
perry
Offline
Joined: 10/11/2010
Groups:

Thanks. Can you run the following command against all servers in the cluster and post the output:
Windows: C:\Program Files\Membase\Server\bin\mbstats :11210 all
Linux: /opt/membase/bin/mbstats :11210 all

It would be helpful to do this now, but more helpful after you start getting those errors (and before you increase the size of the bucket).

To clarify, the "hard out of memory" errors suggest an actual error condition because Membase is unable to allocate RAM to take in new data. As opposed to a "soft" error which implies that we are in the process of draining data to disk to make space for me, the "hard" errors will require some administrative intervention to allocate more RAM.

ANYTIME you get an error from a set (anything other than "TRUE" or "STORED") you can assume that the data did not get written at all. There are some errors that might just require a retry, others that are more pathological.

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Wed, 10/26/2011 - 08:22
chrisribe
Offline
Joined: 09/26/2011
Groups: None

Hi,

I am having this issue also.

One of my two replicated membase servers failed after a "Hard Out Of Memory Error. Bucket..." Had a hard time just failing over and restarting the set, caused my membase to be out of service for 40 minutes.

My bucket was about 50% full also, servers could not be failed over or removed without getting timeouts etc.

How often do your membase instances die / have issues per year (under amazon ec2)??

This is not the 1st time my membases failed me, I thought replication would protect me. But in order to get things running again I had to delete my default bucket re-add it and rebalance / stop repetitively.

Are there any known workarounds or bugfixes available? Anything to watch out for under amazon ec2 hosting?
Here is my log file of the event.
http://www.mediafire.com/?otw8d0y4y54fs8b

Top
  • Login or register to post comments
Mon, 12/05/2011 - 13:45
ep
Offline
Joined: 12/05/2011
Groups: None

We experienced the same error while evaluating membase. It has been running for a few weeks, doing primarily writes.

We started as a two node, 30GB per node memory allocation. After running fine for a week or so (occasional timeouts, but that is another story), we added a third node. It took a long time to rebalance (26 hours?), and at the end reported that there were errors during the rebalance, but I was unable to find any detail whatsoever about said errors. It appeared to run fine for days afterwords.

There are about 130M objects in the "default" bucket. Two nodes are running on Centos5 and the new, third node on Centos6.

This morning the newest node started reporting:
"Hard Out Of Memory Error. Bucket "default" on node 10.xx.yy.zz is full. All memory allocated to this bucket is used for metadata. (repeated 119 times)"

The "CLUSTER OVERVIEW" reports that of 87.8GB allocated, 67.5GB are in use with 20.3GB unused. The stats for the node reporting errors indicate 7.9G for active user data in RAM, 9.24G for replica user data in RAM for a total of 24.2 user data in RAM. Metadata is 269M active, 269M replica, 538M total in RAM. There is nothing listed for pending at all. Disk queues are empty.

It is not viable to add more RAM as we allocated about as much as the boxes are capable of (30GB on 32GB servers).

The memcachd process is still running hot even though there have been no new reads or writes for over two hours.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                               
 6273 membase   20   0 32.5g  30g 1848 S 100.5 95.5   1274:59 memcached                                                                                                                                                                                                       

I do have full mbstats output available if that helps.

Any suggestions for further diagnostic, solutions, steps forward would be greatly appreciated.

Top
  • Login or register to post comments
Mon, 12/05/2011 - 14:20
chrisribe
Offline
Joined: 09/26/2011
Groups: None

Hi ep,

Are you running under amazon EC2?
After getting this error I got another odd one reporting that my attached /mnt drive was now read only!
It seems that this can occur on some failing EC2 hardware. I started a new instance, installed and added the new node. I then dropped the bad EC2 instance, I have been running with no errors now for about 1 month.

Hope that helps.
Chris

Top
  • Login or register to post comments
Mon, 12/05/2011 - 14:39
ep
Offline
Joined: 12/05/2011
Groups: None

Hi Chris,

No EC2. We are running on 3 decent servers, on the same gigabit lan, with local disk.

No indications from the kernel that there were any disk for filesystem problems.

2GB of swap are in use, but there is still plenty of swap free.

And hours later, with now read or write operations happening, the memcached process is still using a full CPU!

Thanks,
Erik

Top
  • Login or register to post comments
Sun, 01/15/2012 - 21:29
chrisribe
Offline
Joined: 09/26/2011
Groups: None

Any progress, on the "Hard Out Of Memory Error"?
Anyone?

Top
  • Login or register to post comments
Wed, 06/06/2012 - 11:49
jasaid
Offline
Joined: 06/22/2011
Groups: None

Hi Perry,

Started getting this error again.

below are the results from teh mbstats cmd

 accepting_conns:                1
 auth_cmds:                      97
 auth_errors:                    0
 bucket_active_conns:            1
 bucket_conns:                   85
 bytes_read:                     61624279639
 bytes_written:                  32169935674
 cas_badval:                     0
 cas_hits:                       0
 cas_misses:                     0
 cmd_flush:                      1
 cmd_get:                        2396431
 cmd_set:                        2241536
 conn_yields:                    18
 connection_structures:          1906
 curr_connections:               945
 curr_items:                     548
 curr_items_tot:                 548
 daemon_connections:             10
 decr_hits:                      0
 decr_misses:                    0
 delete_hits:                    0
 delete_misses:                  0
 ep_bg_fetched:                  0
 ep_commit_num:                  67855
 ep_commit_time:                 0
 ep_commit_time_total:           4064
 ep_data_age:                    1
 ep_data_age_highwat:            11
 ep_db_cleaner_status:           complete
 ep_db_strategy:                 multiMTVBDB
 ep_dbinit:                      0
 ep_dbname:                      e:/membase/data/default-data/default
 ep_dbshards:                    4
 ep_diskqueue_drain:             167162
 ep_diskqueue_fill:              167162
 ep_diskqueue_items:             0
 ep_diskqueue_memory:            0
 ep_diskqueue_pending:           0
 ep_expired:                     254492
 ep_flush_all:                   false
 ep_flush_duration:              0
 ep_flush_duration_highwat:      12
 ep_flush_duration_total:        4126
 ep_flush_preempts:              0
 ep_flusher_state:               running
 ep_flusher_todo:                0
 ep_io_num_read:                 3126
 ep_io_num_write:                262991
 ep_io_read_bytes:               37714564
 ep_io_write_bytes:              5204945519
 ep_item_begin_failed:           0
 ep_item_commit_failed:          0
 ep_item_flush_expired:          220571
 ep_item_flush_failed:           0
 ep_items_rm_from_checkpoints:   513111
 ep_kv_size:                     756340090
 ep_latency_arith_cmd:           0
 ep_latency_get_cmd:             2442038
 ep_latency_store_cmd:           2241536
 ep_max_data_size:               954204160
 ep_max_txn_size:                1000
 ep_mem_high_wat:                715653120
 ep_mem_low_wat:                 572522496
 ep_min_data_age:                0
 ep_num_active_non_resident:     0
 ep_num_checkpoint_remover_runs: 217435
 ep_num_eject_failures:          0
 ep_num_eject_replicas:          0
 ep_num_expiry_pager_runs:       324
 ep_num_non_resident:            0
 ep_num_not_my_vbuckets:         90619
 ep_num_pager_runs:              95760
 ep_num_value_ejects:            0
 ep_onlineupdate:                false
 ep_onlineupdate_revert_add:     0
 ep_onlineupdate_revert_delete:  0
 ep_onlineupdate_revert_update:  0
 ep_oom_errors:                  1925404
 ep_overhead:                    61805631
 ep_pending_ops:                 0
 ep_pending_ops_max:             0
 ep_pending_ops_max_duration:    0
 ep_pending_ops_total:           0
 ep_queue_age_cap:               900
 ep_queue_size:                  0
 ep_storage_age:                 1
 ep_storage_age_highwat:         605
 ep_storage_type:                featured
 ep_store_max_concurrency:       10
 ep_store_max_readers:           9
 ep_store_max_readwrite:         1
 ep_tap_bg_fetch_requeued:       0
 ep_tap_bg_fetched:              0
 ep_tap_keepalive:               300
 ep_tmp_oom_errors:              8611
 ep_too_old:                     0
 ep_too_young:                   0
 ep_total_cache_size:            1567393132
 ep_total_del_items:             220591
 ep_total_enqueued:              487497
 ep_total_new_items:             220052
 ep_total_persisted:             483582
 ep_uncommitted_items:           0
 ep_value_size:                  756228893
 ep_vb_total:                    1024
 ep_vbucket_del:                 446
 ep_vbucket_del_avg_walltime:    2338
 ep_vbucket_del_fail:            0
 ep_vbucket_del_max_walltime:    95640
 ep_vbucket_del_total_walltime:  1042920
 ep_version:                     1.6.5.3_257_g82152fd
 ep_warmed_up:                   2102
 ep_warmup:                      true
 ep_warmup_dups:                 0
 ep_warmup_oom:                  0
 ep_warmup_thread:               complete
 ep_warmup_time:                 199342130
 get_hits:                       202141
 get_misses:                     2194290
 incr_hits:                      0
 incr_misses:                    0
 libevent:                       2.0.11-stable
 limit_maxbytes:                 67108864
 listen_disabled_num:            0
 mem_used:                       818145721
 pid:                            1220
 pointer_size:                   64
 rejected_conns:                 0
 tap_checkpoint_end_received:    550796
 tap_checkpoint_end_sent:        22191750
 tap_checkpoint_start_received:  574321
 tap_checkpoint_start_sent:      22208274
 tap_connect_received:           107
 tap_delete_received:            53631
 tap_delete_sent:                260684
 tap_flush_sent:                 21
 tap_mutation_received:          1247360
 tap_mutation_sent:              159893726
 tap_opaque_received:            1425
 tap_opaque_sent:                260
 threads:                        4
 time:                           1319104633
 total_connections:              4905407
 uptime:                         1168762
 vb_active_curr_items:           548
 vb_active_eject:                0
 vb_active_ht_memory:            12775424
 vb_active_itm_memory:           18450332
 vb_active_num:                  512
 vb_active_num_non_resident:     0
 vb_active_ops_create:           74058
 vb_active_ops_delete:           73510
 vb_active_ops_reject:           0
 vb_active_ops_update:           18008
 vb_active_perc_mem_resident:    100
 vb_active_queue_age:            0
 vb_active_queue_drain:          167162
 vb_active_queue_fill:           167162
 vb_active_queue_memory:         0
 vb_active_queue_pending:        0
 vb_active_queue_size:           0
 vb_dead_num:                    0
 vb_pending_curr_items:          0
 vb_pending_eject:               0
 vb_pending_ht_memory:           0
 vb_pending_itm_memory:          0
 vb_pending_num:                 0
 vb_pending_num_non_resident:    0
 vb_pending_ops_create:          0
 vb_pending_ops_delete:          0
 vb_pending_ops_reject:          0
 vb_pending_ops_update:          0
 vb_pending_perc_mem_resident:   0
 vb_pending_queue_age:           0
 vb_pending_queue_drain:         0
 vb_pending_queue_fill:          0
 vb_pending_queue_memory:        0
 vb_pending_queue_pending:       0
 vb_pending_queue_size:          0
 vb_replica_curr_items:          0
 vb_replica_eject:               0
 vb_replica_ht_memory:           12775424
 vb_replica_itm_memory:          0
 vb_replica_num:                 512
 vb_replica_num_non_resident:    0
 vb_replica_ops_create:          0
 vb_replica_ops_delete:          0
 vb_replica_ops_reject:          0
 vb_replica_ops_update:          0
 vb_replica_perc_mem_resident:   0
 vb_replica_queue_age:           0
 vb_replica_queue_drain:         0
 vb_replica_queue_fill:          0
 vb_replica_queue_memory:        0
 vb_replica_queue_pending:       0
 vb_replica_queue_size:          0
 version:                        1.4.4_461_gf99c147
 

Top
  • Login or register to post comments
Thu, 06/14/2012 - 17:15
ingenthr
Offline
Joined: 03/16/2010
Groups:

Many, many memory accounting issues have been fixed in the 1.8 series. Have you considered upgrading to 1.8? See couchbase.com/docs for details on what that entails.

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker