Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.7.x

Nodes offline under nominal load

2 replies [Last post]
  • Login or register to post comments
Wed, 07/20/2011 - 06:04
loopforever
Offline
Joined: 07/14/2011
Groups: None

Hi all,

I have a small Membase cluster I'm playing with.

The cluster has 1 membase bucket ("default") configured with 1 replica and per-node RAM quota of 128MB.

Node 1: 2 CPU / 12GB RAM (RHEL5 VM)
Node 2: 24 CPU / 24GB RAM (RHEL5)
Node 3: 24 CPU / 48GB RAM (RHEL6)

I have a small Ruby script that sets objects. My keys are predictably named "1" .. "n".

Frequently while setting objects, nodes seem to randomly go offline. For example, I inserted 2000 32KB objects and took these screenshots while doing it:

http://imageshack.us/g/193/membaseclusterdown.png/

This behavior is very repeatable for me. I can insert 10,000 1KB objects or 1,000 8KB objects, really any variation...eventually nodes go offline.

In some cases, the server will return a "temporary failure", and when I receive that ServerError exception I have my client back-off for a few seconds and then try again. This behavior ("dear client, please leave me alone for a second") is expected according to the documentation however the nodes going offline randomly seems unusual to me.

When the nodes go offline, they do eventually come back up. The processes themselves never die, the cluster just perceives the nodes are offline transiently.

I guess my questions are...
1) Am I doing something wrong?
2) How can I investigate this more? (Are there more detailed log files I can review besides those logs in the web interface?)

Thanks in advance!

- Matt

Top
  • Login or register to post comments
Wed, 07/20/2011 - 12:57
perry
Offline
Joined: 10/11/2010
Groups:

Yes, it does seem that those nodes are actually going offline. We'll need to look deeper into the logs to determine what the cause is.

Can you please run "/opt/membase/bin/mbcollect_info " on each node and send the resulting files to perry@couchbase.com?

Also, can you please check /var/log/messages on each of those nodes for any indication that the Linux OOM killer is interracting with the processes here? You should be able to run 'grep -i oom /var/log/messages' and if you see any output at all, you're likely too low on memory.

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Fri, 07/22/2011 - 05:15
loopforever
Offline
Joined: 07/14/2011
Groups: None

Perry,

Thanks for your response. I ended up getting 3 new (identical) VMs provisioned to test out my membase cluster without any contention for resources. In that environment, I have not experienced any of the nodes going offline, so I think we can attribute the behavior in my post above to some of the other processes on those systems competing for resources. If it happens again in my isolated environment I'll be sure to send you the mbcollect_info output.

Thanks again!

- Matt

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker