Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.6.x

Membase will exhaust free disk space soon - how to determine what is being stored?

9 replies [Last post]
  • Login or register to post comments
Wed, 02/23/2011 - 00:13
esilverberg
Offline
Joined: 01/03/2011
Groups: None

 Hi Folks,

I presently have a single m2.xlarge membase instance powering a membase install for my app. This instance provides 15.7GB for membase and 403 GB for storage. I brought this instance up last Wednesday. After approximately 7 days of continuous operation, my membase instance is consuming 195GB of on-disk storage, and reports 13.8M entries in the cache. While I do cache a certain class of data for two weeks, there are only about 200k objects that fall into this class. The remainder are short-lived pieces of data that expire after 15 minutes.  

Bottom line is I have absolutely no clue what is represented in those 13.8M entries or 195GB of disk data. I see my disk write queue size hovering around 600/sec, but I do not ever see any disk reads.

If I do not diagnose and correct this issue my node will exhaust disk space in the next week and crash. 

How can I inspect what is actually being stored on disk, and determine if this is either a bug in my code, or a bug in the membase code that is supposed to expire old data. Is it possible that the data expiry code of membase is just being really, really, really lazy, given the very large amount of disk space available to it on this kind of EC2 instance? 

Thanks,

Eric

Top
  • Login or register to post comments
Wed, 02/23/2011 - 07:40
perry
Offline
Joined: 10/11/2010
Groups:

 Eric, what version of Membase is this?

 

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Wed, 02/23/2011 - 09:17
esilverberg
Offline
Joined: 01/03/2011
Groups: None

Hi Perry - It is the latest - 1.6.5

Here are is the output from the stats command:

 auth_cmds:                   174

 auth_errors:                 1

 bucket_conns:                114

 bytes_read:                  703420870361

 bytes_written:               1596923768227

 cas_badval:                  0

 cas_hits:                    0

 cas_misses:                  0

 cmd_flush:                   0

 cmd_get:                     3011178029

 cmd_set:                     77332916

 conn_yields:                 27011744

 connection_structures:       164

 curr_connections:            124

 curr_items:                  14302943

 curr_items_tot:              14302943

 daemon_connections:          10

 decr_hits:                   0

 decr_misses:                 0

 delete_hits:                 10754276

 delete_misses:               189150004

 ep_bg_fetched:               0

 ep_commit_num:               804834

 ep_commit_time:              0

 ep_commit_time_total:        373500

 ep_data_age:                 5

 ep_data_age_highwat:         439

 ep_db_cleaner_status:        complete

 ep_db_strategy:              multiMTVBDB

 ep_dbinit:                   0

 ep_dbname:                   /mnt/membase/default

 ep_dbshards:                 4

 ep_expired:                  1243268012

 ep_flush_duration:           2

 ep_flush_duration_highwat:   348

 ep_flush_duration_total:     404519

 ep_flush_preempts:           0

 ep_flusher_state:            running

 ep_flusher_todo:             0

 ep_io_num_read:              0

 ep_io_num_write:             63467697

 ep_io_read_bytes:            0

 ep_io_write_bytes:           505878277095

 ep_item_begin_failed:        0

 ep_item_commit_failed:       0

 ep_item_flush_expired:       15449460

 ep_item_flush_failed:        0

 ep_kv_size:                  6770457729

 ep_max_data_size:            16872636416

 ep_max_txn_size:             1000

 ep_mem_high_wat:             12654477312

 ep_mem_low_wat:              10123581849

 ep_min_data_age:             0

 ep_num_active_non_resident:  0

 ep_num_eject_failures:       0

 ep_num_eject_replicas:       0

 ep_num_expiry_pager_runs:    165

 ep_num_non_resident:         0

 ep_num_not_my_vbuckets:      0

 ep_num_pager_runs:           0

 ep_num_value_ejects:         0

 ep_oom_errors:               0

 ep_overhead:                 93244037

 ep_pending_ops:              0

 ep_pending_ops_max:          0

 ep_pending_ops_max_duration: 0

 ep_pending_ops_total:        0

 ep_queue_age_cap:            900

 ep_queue_size:               385

 ep_storage_age:              3

 ep_storage_age_highwat:      424

 ep_storage_type:             featured

 ep_store_max_concurrency:    10

 ep_store_max_readers:        9

 ep_store_max_readwrite:      1

 ep_tap_bg_fetch_requeued:    0

 ep_tap_bg_fetched:           0

 ep_tap_keepalive:            0

 ep_tmp_oom_errors:           0

 ep_too_old:                  0

 ep_too_young:                0

 ep_total_cache_size:         541785202697

 ep_total_del_items:          24119294

 ep_total_enqueued:           103472341

 ep_total_new_items:          38309967

 ep_total_persisted:          87586991

 ep_vbucket_del:              0

 ep_vbucket_del_fail:         0

 ep_version:                  1.6.5

 ep_warmed_up:                0

 ep_warmup:                   true

 ep_warmup_dups:              0

 ep_warmup_oom:               0

 ep_warmup_thread:            complete

 ep_warmup_time:              11704

 get_hits:                    2186393906

 get_misses:                  824784123

 incr_hits:                   0

 incr_misses:                 0

 libevent:                    2.0.7-rc

 limit_maxbytes:              67108864

 mem_used:                    6863701766

 pid:                         9296

 pointer_size:                64

 rejected_conns:              0

 rusage_system:               68901.530000

 rusage_user:                 36600.190000

 threads:                     4

 time:                        1298477810

 total_connections:           189

 uptime:                      596473

 version:                     1.4.4_364_g056e303

 

Top
  • Login or register to post comments
Wed, 02/23/2011 - 10:54
perry
Offline
Joined: 10/11/2010
Groups:

 

 Looks like we are actually expiring things:

 ep_expired:                  1243268012

 ep_item_flush_expired:       15449460

 ep_num_expiry_pager_runs:    165

 

What we don't do is actually "reclaim" disk space.  When we delete an item, it makes a hole that gets reused later on, but the disk space doesn't shrink.

 

You can take a backup (http://wiki.membase.org/display/membase/Backup+and+Restore+with+Membase) and then scan through the db files using sqlite syntax to see what's actually stored in there.

 

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Wed, 02/23/2011 - 12:13
esilverberg
Offline
Joined: 01/03/2011
Groups: None

 Interesting, thanks for the info. Given that there is no compaction step for membase, what you are saying then is for a healthy membase instance, over time I should expect to see the used disk space reach the total capacity and just stay there indefinitely? If we looked at Farmville's membase servers, the disk usage graph would be maxed out all the time? 

Can you provide any insight into that 13M "Total Items" statistic? Does that mean there have been 13M total items ever written to membase? Or does that mean that there are 13M total items in membase at this moment? The former makes sense to me; the latter is totally perplexing and I'd definitely need to take the backup and attempt to crack open in SQLite. 

Thanks again for all your help perry, I really do appreciate it. I always try to write SEO-able forum posts so these exchanges will be findable by other membase users in the future. 

Best,

Eric

Top
  • Login or register to post comments
Wed, 02/23/2011 - 12:14
perry
Offline
Joined: 10/11/2010
Groups:

 There actually is a compaction step, is just can't be done "online".  We can use the sqlite 'vaccuum' command:

-Either take the servers down and vaccum the files in place 

OR

-Take a live backup, vaccuum it and then shut the servers down, replace the data files and start back up.  The disadvantage to this is that you may have some data change after the backup that wouldn't be up to date...depends on whether your application can deal with that.

 

You are correct about the disk usage, it "should" reach capacity and then stay there provided there isn't any net-new data added.  

 

As far as the "Total Items" goes, that is the count of active items AND replicas.  "Unique" items is just active items and should match what you've put in.

 

Glad to help Eric, let me know what else I can do to help.

 

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Thu, 02/24/2011 - 12:34
esilverberg
Offline
Joined: 01/03/2011
Groups: None

 Hi Perry,

I went ahead and tested what membase does in the face of resource exhaustion, given the problems I have had with it in the past. From what I can tell, membase does not handle resource reclamation or exhaustion well at all.

Here is my test environment:

- One m1.micro EC2 instance 

- Membase 1.65 32-bit

- 476MB allocated in RAM, 7.87 GB allocated in disk

I created a file 1MB in size using the first method listed here, and wrote the following ruby script:

 

Click here for the script on github (code doesn't display properly in this forum). 

I ran this script four times to enter this state. Here is a screen shot of my dashboard. It would appear membase is unable to clear the expired data, which should only have lasted for 10 seconds. I am unable to run a 'stats' command because membase is constantly crashing/exiting, as per this screen shot. 

This is is an extremely basic test configuration that anyone should be able to reproduce on their own EC2 micro instance. 

Perry, can you explain what is going on, and whether or not this is by design, a bug in membase, or something else? Really I was hoping to test disk exhaustion, which I didn't even get to in this experiment.

Thanks,

Eric

Top
  • Login or register to post comments
Fri, 02/25/2011 - 10:03
perry
Offline
Joined: 10/11/2010
Groups:

 The memcache crashing you're experiencing is a bug: http://forums.membase.org/thread/membase-fall-down-every-2-days#comment-1002648

 

It's already been fixed and will be included in an upcoming release.

 

Membase actually does handle resource exhaustion fairly well...given that you are staying within the bounds of the system.  You should make sure to follow the guidelines outlined here: http://wiki.membase.org/display/membase/Sizing+Guidelines

 

Also, you may want to consider engaging with our sales and system-engineering teams to help you along in the process.  If you're planning on going into production, you really want to be using the Enterprise Edition as it goes through a much more rigorous QA and regression testing process.  It will also include hotfixes (like the one you're running into) long before the Community Edition gets them.

 

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Sat, 02/26/2011 - 15:45
esilverberg
Offline
Joined: 01/03/2011
Groups: None

 Thanks for this info Perry. I really appreciate all your help! Unforunately, given the repeated challenges and problems I've faced with membase, I've decided to replace it and use memcached directly. There were three big reasons why:

1) Membase used more memory than I though it should. Memcached stable state is around 250MB; membase, 4-6GB. This had a very real cost to me: an m1.small is much cheaper than an m2.xlarge

2) Membase wrote gigabytes of data to disk, inexplicably. By the end, I had a working set of 6GB in RAM and about 25GB written daily to disk. This makes no sense, given that my working set (as observed using memcached directly) is around 250MB. The forcing function in my migration decision was the fact that I had no faith that membase would not fall over when it exhausted disk space, even though (per this thread) it should continue to function correctly.

3) Clusterized membase servers went down together, obviating the advertised redundancy benefit. Twice I had whole clusters die. Auto-rebalancing was the cause of this failure at least once.

All the promised features of membase, I love, want, and need, but in the end it seems like the product still has a ways to go to deliver on its core features. I'd buy an enterprise support license, but unfortunately I am working on a breakeven mobile app, and cannot afford any additional infrastructure expenses. 

I have now written my own app logic that auto-discovers my memcached ec2 instances using the ec2 API, prioritizes them based on AZ (not a concept known to membase), and uses a short socket timeout to fail over to the backup memcached server if the first one doesn't reply in time. 

Best of luck, and hopefully I will be able to revisit this decision in the coming months after membase has had some additional bake time.

Cheers,

Eric

Top
  • Login or register to post comments
Sat, 02/26/2011 - 16:43
perry
Offline
Joined: 10/11/2010
Groups:

 Thanks for your detailed response Eric.

 

If you're willing, I'd love to try and address your concerns above in order to gain your confidence and continued use of Membase.  The first 2 issues seem related and the third is just strange.  We have a number of very large deployments both within EC2 and data centers without issue, handling >100k ops/sec and utilizing hundreds of gigabytes of space (nearly terabytes in some cases).  I don't mean to boast, just to provide evidence that the software does work as intended, when given the right attention and monitoring.

 

If Membase doesn't provide you any value over memcached then by all means, there's no reason not to use memcached.  If you do see/need the value of Membase though, I'd be happy to work with you directly to make it successful in your environment.

 

Please feel free to email me directly at perry -at- couchbase -dot- com if you're interested in continuing the conversation.

 

Thanks Eric.

 

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker