a2b out of memory
Trying to load up some data into a membase install and getting "a2b out of memory" errors after a while. The persistent storage seems to be working ok for quite some time before this happens, though - I notice the sqlite databases growing consistently over the course of the data load. We're not doing anything dramatic (maybe 100-300 ops/sec to load it). Our memory allocation is just set to the minimum (256MB) on this test server.
My expectation was that as the size of the data grew and began to hit the watermarks it would begin filling the disk with these items (which is what it appeared to do), then we began seeing the error above. What's interesting is that once we hit the "a2b out of memory" error, membase doesn't appear to flush the rest of the data to disk. So, for example, if I resume the data load again after a few minutes, it would still give the error (even after an hour). We're using the 1.6.0.1 community release.
Thanks for the quick reply Perry, after watching it for a while I think that's exactly the case - and at some point the server just gets too far behind. I'm trying to use flushctl to set the mem_low_wat and mem_high_wat params but it's giving me this error:
TypeError: set_flush_param() takes exactly 3 arguments (2 given)
Had some more time to investigate this... It seems like the trouble starts when I issue a "flush" after having a series of those error messages above. After doing so the ep disk queue value never changes (it stays at 0) so the server quickly begins reporting 'temporary failure' and never recovers. At this point even the "stats" tool reports the following:
Traceback (most recent call last):
File "./stats", line 163, in <module>
main()
File "./stats", line 160, in main
c.execute()
File "/opt/membase/1.6.0.1/ep-engine/management/clitool.py", line 42, in execute
f[0](mc, *args[2:], **opts.__dict__)
File "./stats", line 34, in g
f(*args[:n])
File "./stats", line 53, in stats_all
stats_formatter(mc.stats())
File "/opt/membase/1.6.0.1/ep-engine/management/mc_bin_client.py", line 244, in stats
cmd, opaque, cas, klen, extralen, data = self._handleKeyedResponse(None)
File "/opt/membase/1.6.0.1/ep-engine/management/mc_bin_client.py", line 83, in _handleKeyedResponse
raise MemcachedError(errcode, rv)
mc_bin_client.MemcachedError: Memcached error #130: Out of memorybfolkens,
Our Engineers here at Membase are looking into your problem. I will get back to you as soon as we resolve this.
Thanks
Bhawana
Thanks so much for the help. Not sure if this helps your team or not, but I just tried sizing the instance up to 512MB instead of 256 and I still experienced the same symptoms (where the system stops flushing and ep_queue_size stays at 0). I can also confirm it happens from a fresh install without a flush first (at first it seemed like it happened after a flush, but it happens regardless).
Thanks so much for the help. Not sure if this helps your team or not, but I just tried sizing the instance up to 512MB instead of 256 and I still experienced the same symptoms (where the system stops flushing and ep_queue_size stays at 0). I can also confirm it happens from a fresh install without a flush first (at first it seemed like it happened after a flush, but it happens regardless).
What os are you running this on? Is it 32 bit or 64 bit?
Bhawana
What os are you running this on? Is it 32 bit or 64 bit?
Bhawana
32-bit
I haven't been able to reproduce this error that you are seeing. How long after you started loading the data, did you see this? You mention in your first post that you start getting this error quite some time after you loaded the data.
Bhawana
I haven't been able to reproduce this error that you are seeing. How long after you started loading the data, did you see this? You mention in your first post that you start getting this error quite some time after you loaded the data.
Bhawana
Yes, it was several hours and after a few 100,000x keys were set.
Hello,
I have not been able to reproduce the errors you see. What are the values of your mem_high_wat and mem_low_wat?
I plan to keep adding data and see if I get the error and will let you know if I see anything.
Bhawana
We're currently experiencing the same issue. To test membase read/write performance when the RAM is full (not the disk). I've setup a membase server with 100mb bucket and started writing to it. After a while, we get the "SERVER_ERROR a2b out of memory". The disk queue seems to flush constantly and we see some ram evictions happening.
ep_mem_high_wat: 402653184
ep_mem_low_wat: 322122547
I'll try running this again and giving you a list of the ./stats output. Is there another debug output you'd like to see?
bfolkens,
Are you using the default bucket?
Bhawana
Here's a fresh run:
membase-server v1.6.0.1
x86 platform
gcc v4.3.4
Configured to use 256MB of ram (out of 1.7GB), default bucket, 0 replicas, 1 server
$ ./stats localhost:11210 all
auth_cmds: 1
auth_errors: 0
bucket_conns: 3
bytes_read: 4111710999
bytes_written: 451661495
cas_badval: 0
cas_hits: 0
cas_misses: 0
cmd_flush: 1
cmd_get: 1162293
cmd_set: 2325506
conn_yields: 0
connection_structures: 8
curr_connections: 8
curr_items: 2324571
curr_items_tot: 2324571
daemon_connections: 5
decr_hits: 0
decr_misses: 0
delete_hits: 0
delete_misses: 0
ep_bg_fetched: 0
ep_commit_num: 10398
ep_commit_time: 0
ep_commit_time_total: 2144
ep_data_age: 2
ep_data_age_highwat: 12
ep_dbinit: 0
ep_dbname: /home/membase/1.6.0.1/data/ns_1/default
ep_dbshards: 4
ep_expired: 0
ep_flush_duration: 0
ep_flush_duration_highwat: 8
ep_flush_duration_total: 2175
ep_flush_preempts: 0
ep_flusher_state: running
ep_flusher_todo: 0
ep_io_num_read: 0
ep_io_num_write: 2324571
ep_io_read_bytes: 0
ep_io_write_bytes: 3937810793
ep_item_commit_failed: 0
ep_item_flush_expired: 0
ep_item_flush_failed: 0
ep_kv_size: 255507713
ep_max_data_size: 268435456
ep_max_txn_size: 250000
ep_mem_high_wat: 201326592
ep_mem_low_wat: 161061273
ep_min_data_age: 0
ep_num_eject_failures: 639999137
ep_num_expiry_pager_runs: 4
ep_num_non_resident: 2324571
ep_num_not_my_vbuckets: 0
ep_num_pager_runs: 939
ep_num_value_ejects: 2324571
ep_oom_errors: 228
ep_overhead: 12927624
ep_pending_ops: 0
ep_pending_ops_max: 0
ep_pending_ops_max_duration: 0
ep_pending_ops_total: 0
ep_queue_age_cap: 900
ep_queue_size: 0
ep_storage_age: 0
ep_storage_age_highwat: 9
ep_storage_type: featured
ep_tap_keepalive: 0
ep_tmp_oom_errors: 707
ep_too_old: 0
ep_too_young: 0
ep_total_cache_size: 4076044533
ep_total_del_items: 0
ep_total_enqueued: 2324572
ep_total_new_items: 2324571
ep_total_persisted: 2324571
ep_vbucket_del: 0
ep_vbucket_del_fail: 0
ep_version: 1.6.0_10_g3b4878a
ep_warmed_up: 0
ep_warmup: true
ep_warmup_dups: 0
ep_warmup_oom: 0
ep_warmup_thread: complete
ep_warmup_time: 0
get_hits: 0
get_misses: 1162293
incr_hits: 0
incr_misses: 0
libevent: 2.0.8-rc
limit_maxbytes: 67108864
mem_used: 268435337
pid: 6304
pointer_size: 32
rejected_conns: 0
rusage_system: 50.975250
rusage_user: 502.402623
threads: 4
time: 1289588201
total_connections: 14413
uptime: 17354
version: 1.4.4_298_g250909b
Thanks bfolkens, looks like there's definitely something unexpected going on there.
We're engaging with our engineers now to look at it.
Can you tell me how large your items are?
Perry
Around 6-8kB
So we've done a bit of analysis here.
Your stats output was very helpful, and showed that all the avialable memory is being taken up by item metadata.
Looking at some stats in specific:
-mem_used (mem_used: 268435337) is over the memory limit (ep_max_data_size: 268435456)
-The software has ejected almost all of the items (ep_num_value_ejects: 2324571 and ep_num_non_resident: 2324571) yet the memory has not been reclaimed.
-If we take the number of items (curr_items: 2324571) and divide them into the memory used (mem_used: 268435337) it equals about 110 bytes per item which is almost exactly the amount of per-item overhead we have.
The solution here is to add more memory or store less items, and I have filed a bug to improve the behavior when this happens to make it easier to figure out.
Let me know if you need any further clarification on this.
Thanks Perry
Thanks for clarifying Perry. So that's total memory for the cluster that needs to be available to store the metadata correct? Each node only stores the metadata for the keys on that particular node?
Thanks for clarifying Perry. So that's total memory for the cluster that needs to be available to store the metadata correct? Each node only stores the metadata for the keys on that particular node?
Running into the exact same issue and wondering the same as above.
Correct, each node only stores the metadata for the keys that are on that particular node. One thing to keep in mind is that a node not only stores its active items but any replica items that it is also responsible for.
Perry
Hrm, ok - I think that just killled it for us since we've got something like 80 million keys (and growing) and were hoping to spread it across commodity hardware (~1.5GB per node) - but even if we had 6 nodes with 1.5GB a piece it would be mostly metadata.
Are there any future plans to flush LRU metadata out to disk as well?
Sounds like you are correct in that you need more than 9GB of RAM in order to store over 80 million keys...not much can be done about that.
As far as flushing metadata out to disk, we have certainly considered it but I don't know that there are any concrete plans to implement that. One of the nice features about Membase (inherited from memcached) is the ability to VERY quickly tell you that an item DOES NOT exist rather than possibly spending multiple seconds looking up an item's location on disk just to return with "not found" to the client after making it wait so long.
There are improvements that can be done to reduce the amount of overhead per-item, and those will be evaluated and implemented as necessary.
Perry
Take a look at this wiki entry and see if that helps clear up the behavior: http://wiki.membase.org/display/membase/Growing+Data+Sets+Beyond+Memory
I would expect that you would be able to put more items in (unless the disk is actually full, which I doubt) after the write queues have drained sufficiently.
You can watch the write queue in the UI by going to a particular buckets statistics page, clicking on "Configure View" and selecting "Disk Write Queue".
Perry
Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!