Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.7.x

Replica data not properly flushed, TAP sometimes hangs, and eventually node crashes.

No replies
  • Login or register to post comments
Mon, 12/05/2011 - 04:01
Jell
Offline
Joined: 10/28/2010
Groups: None

Hi,

We have been using Membase for a year at low traffic without much problems, but now we seem to encounter an increasing amount of instability as we are scaling up. Here is what we do, please let us know if something goes against membase philosophy:

- We are running three high memory instances running Ubuntu 10.4 on Amazon Web Services with one replica and auto-failover (Membase community edition, v1.7.1.1).

- Our application performs around 3000 gets per seconds and 500 mutations, and some of our data expires after only 30 seconds. At peak times we have around one million keys in the database, which is really nothing compared to the size of the instances we are using (16Gb of RAM).

- We make extensive use of CAS updates (we would prefer to just use increment but we also want to update the time to live).

- Most of our data are counters and locks, but some of it is a serialized hash that can grow to a reasonable size.

- We perform a vacuum on the sqlite databases each hour.

- We run a custom TAP script to get a backup each night. We used to take a backup with mbbackup, but some data that what not flushed to disk was missing, and using the TAP interface gave us better results.

Now the symptoms:

- The number of replica items on a node keeps growing.

- The RAM consumption can suddenly and dramatically increase on a node, and our TAP backup hangs (I don't know if the fact that our TAP scripts hangs is the cause or the symptom. We send a simple DUMP request on each node, and we run it locally, so I'm not sure how this can become a problem).

- Eventually, one of the node crashes, the autofailover kicks in but we lose a very large amount of data (up to 24h of data loss!). This indicates that neither the replication nor the persistence are working as we would expect them to.

Any suggestion? Our cluster crashes once a week on average, it is quite a problematic situation for us.

Best Regards,
Jean-Louis

Top
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker