Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.6.x

What is the expected startup time behavior?

8 replies [Last post]
  • Login or register to post comments
Thu, 05/05/2011 - 11:12
wsorenson
Offline
Joined: 02/15/2011
Groups: None

I have a single membase server with a 90GB bucket. More than 3 hours after restart, the bucket is not available.

I can't find any documentation other than http://techzone.couchbase.com/issues/browse/MB-1905 which suggests that startup time is a major problem for large data sets, which is roughly what I'm seeing.

Top
  • Login or register to post comments
Thu, 05/05/2011 - 20:58
perry
Offline
Joined: 10/11/2010
Groups:

You are correct that the startup time can be quite significant for large datasets. Vaccuuming the DB files before warmup, while also time intensive, can greatly speed up the warmup process. This is why we suggest vacuuming your backup files so that in the event you have to restore, it can be done more quickly.

In the current software, you can monitor the progress of the warmup by watching:
"/opt/membase/bin/ep_engine/management/stats :11210 all [bucket_name] [bucket_password] | grep warm"

The next version of our UI will have a better indication of the state of the servers to help monitor this as well.

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Fri, 05/06/2011 - 00:04
wsorenson
Offline
Joined: 02/15/2011
Groups: None

Thanks for responding. After nearly 24 hours, the machine is less than 10m records through a 68m record dataset, rendering the persistence portion of membase irrelevant. It is faster to load a new machine or cluster from scratch, and I can't see how this makes any sense in a production environment - unless this startup time is drastically reduced, why persist "large" data sets?

It also looks like vacuuming may have performance problems: http://techzone.couchbase.com/issues/browse/MB-1612 ?

I appreciate the product, but don't see how this isn't the achilles heel of membase and how MB-1905 isn't a/the top priority.

Top
  • Login or register to post comments
Sat, 05/07/2011 - 18:59
perry
Offline
Joined: 10/11/2010
Groups:

Vacuuming doesn't have performance issues...it can just take a long time on a large dataset (for the same reasons your restoration is taking a while).

As for MB-1905, while it's priority could be higher, you would still be suffering greatly from having every request serviced from disk...rendering the system essentially unusable for different reasons.

I certainly understand that this is a pain point for you, but there are no other solutions available that make this better. The fact of the matter is that Membase's performance comes from serving data from RAM...and we need to have data in RAM in order to serve it. Loading 10's to 100's of GB into RAM takes time...it's a physics thing ;-)

An option for you going forward would be to distribute your dataset across more nodes so that each one is holding a smaller portion of the entire dataset.

It's also probably not a really valid answer, but we would recommend against restarting a Membase machine of course ;-). In the event that you have an unexpected restart/crash, you can fail the node over and continue access the data from the replicas. This wouldn't help in a total cluster outage, but there are always different options for different failure scenarios.

Something else you might want to take a look at is our mbrestore script (http://techzone.couchbase.com/wiki/display/membase/Membase+Server+versio...) which will allow you to load the data from an on-disk file into another cluster. You would then be able to have your application up and running while the data is loaded in. This could also take a while (for the same reasons) but at least you'd have access to the data as it's being loaded.

Let me know if that helps answer some of your concerns.

Going forward, the CouchDB integration of 2.0 will change this behavior, though I don't have enough details on to what extent.

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Sun, 05/08/2011 - 06:16
wsorenson
Offline
Joined: 02/15/2011
Groups: None

The big takeaway here is don't use Membase as anything other than a cache that can spill items over to disk (with plan-ahead replication - because the rebalance speed seems to be about the same as the warmup bucket speed - correct me if I'm wrong there - also a side question, if you add a server and begin re-balancing, is the cluster still supposed to be able to serve requests? We had mixed luck with that.)

Thank you for the suggestions - I've already reloaded our dataset on a much larger cluster with replication 1, so I can ignore the old machine whenever it finishes loading the bucket. (It's almost halfway done now. My guess is > 1 week.) But we still need a disaster scenario because we all know machines sometimes restart, (fail to serve requests after a failed rebalance -- the reason I restarted membase), etc. - maybe something like Hbase which is a KV store which defies physics by being able to start-up in a reasonable amount of time and read items from disk and then serve them from RAM. I'm pretty sure I could write a program to load the 10GB of RAM that my machine was using in less than 1 week. In fact, maybe you guys could have a challenge where a candidate has to write a program to read the data files and load them into RAM before Membase finishes loading up - I'd put my money on a smart engineer (excuse the tongue-in-cheek, but I assumed we all were on the same page re: the performance of RAM vs disk.)

The point is, there is a performance problem there, and I'd be surprised if you had all your customers/users in one room and we couldn't all agree that Membase would greatly benefit from serving buckets "ASAP", even if items had to be largely served directly from disk and then loaded into RAM - applications would experience a degradation in performance; significant, but not a total loss of data - which is where you're at today. I assume you have to at least load the index, and maybe the architecture complicates things -- I don't know about CouchDB integration, that sounds like a whole other thing, but TL;DR +1 for MB-1905.

Top
  • Login or register to post comments
Tue, 05/10/2011 - 16:59
perry
Offline
Joined: 10/11/2010
Groups:

I have to disagree that Membase should only be used as a "cache". We actually have a number of very large customers using Membase as their primary datastore (see Zynga for examples). I do agree that there are optimizations that can be made, no doubt there. Feel free to contribute the necessary code to the open-source project (yes, a little tongue-in-cheek for you as well ;-))

The cluster IS designed to continue to function while rebalancing. Again, there could certainly be bugs and I'm willing to work with you to get it working to your expectations.

CouchDB stores the B-tree as the last value on disk so it's very quick to recover and load that index into RAM and begin serving data immediately...should be effectively solving MB-1905.

Let's be clear, I'm not arguing that our system could be made better...I'm simply stating that I think we still have the best solution available for the use cases that it fits.

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Wed, 05/11/2011 - 10:52
wsorenson
Offline
Joined: 02/15/2011
Groups: None

I wonder how many unexpected restarts Zynga has had to deal with in their cluster and what their rebuild process is if all replicas go down. My server ran fine for 80 days before this happened, and FYI - these were the series of events that lead up to the restart:

A) Server is getting too slow -- add another
B) Start rebalance - rebalance taking an incredibly long time
C) Stop rebalance, fail over new server.
D) Server no longer responded to requests
E) Restart Server

Now, I'm in a state where the server itself has restarted itself twice in the last week, and has never loaded the 90GB bucket.

How can I find the reason for those two self-restarts?

As a result, we're looking at a host of other K/V stores with better reliability & recovery times. I'd like to stick to Membase because of the web UI, and replication + persistence scheme (and resulting performance characteristics), but I'm not even sure if Membase will be the same product once it's integrated with CouchDB, which from what I seem to be hearing, is the only point at which MB-1905 is going to be resolved.

Top
  • Login or register to post comments
Wed, 05/11/2011 - 16:22
perry
Offline
Joined: 10/11/2010
Groups:

Zynga uses many more servers so that each one is smaller and has a faster restart time. They've also segmented their data for various applications/games so that a reboot or failure doesn't affect the entire function.

As far as the issues you've had, I would be very interested to investigate and diagnose what's going on. Do you still have logs available?

Let me be crystal clear (if possible), Membase 2.0 with the CouchDB integration will be the same product (okay, it will be a better product) that will support the same protocol and have an upgrade path from the versions we currently have. There will be lots of new features and bug fixes, one of which will be MB-1905.

I would very much like to have you as a successful user of our software and want to do everything in my power to make that happen.

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Fri, 05/13/2011 - 06:50
wsorenson
Offline
Joined: 02/15/2011
Groups: None

I have logs, the server is now restarting itself appx. every 2 days (there is a memcached bucket and a smaller membase bucket, both of which come up.) However, the larger bucket never comes up - perhaps its failed recoveries are causing the restarts.

I will send logs in email.

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker