Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.7.x

Understanding Membased availability / unplanned node failure behavior

1 reply [Last post]
  • Login or register to post comments
Thu, 07/14/2011 - 12:22
loopforever
Offline
Joined: 07/14/2011
Groups: None

Hi,

I scoured the forums and read the the wiki page here (http://www.couchbase.org/wiki/display/membase/Failover+with+Membase) so I apologize for bringing this up again, I just want to make sure I fully understand the expected behavior of Membased during unplanned/unexpected node failures.

Assume I am using Membase buckets with 2 replicas in a 3 node cluster.

I performed a fairly simple test generating random key/value pairs and writing them cluster:

#!/bin/env ruby
 
require "rubygems"
require "memcached"
require "active_support/secure_random"
require "time"
require "pp"
 
iterations = ARGV[0].to_i
 
m = Memcached.new(["host1:11211", "host2:11211", "host3:11211"])
 
1.upto(iterations).each do |i|
  key = value = ActiveSupport::SecureRandom.hex(8)
 
  m.set key, value
end

I insert 200,000 objects.

While this is running, I forcibly shutdown one of the nodes in the cluster (just /etc/init.d/membase-server stop).

Moments after that one node goes down, the job that is setting objects fails with this error:

"proxy write to downstream". Key {"13055"=>"host1:11211:8"}. (Memcached::ServerError)

Based on everything I read, it actually sounds like this is the expected (by design) behavior. There is no automated failover, despite the fact that the cluster is aware it has lost a member. As soon as I "failover" that downed node in the GUI, the inserts work again; that too jibes with what I read.

We believe membase has many of the features we need for our use-case, but if the unexpected loss of a single node results in any downtime for the cluster, I think we cannot proceed with our testing.

Is my test valid? If so, how can I make unexpected node failures more transparent? In our case, the data served by our cluster(s) must be available without interruption.

Thanks for your help!

- Matt

Top
  • Login or register to post comments
Fri, 07/15/2011 - 11:30
perry
Offline
Joined: 10/11/2010
Groups:

Hey Matt, all you've described is correct and by design. With 1.7.1 we are introducing an automatic failover feature so I think that should alleviate most of your concerns.

Keep an eye out for that release in the next few days, let me know if you have any other questions.

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker