Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | SDKs | SDKs

Java SDK - rebalance causes all operations to block

3 replies [Last post]
  • Login or register to post comments
Thu, 02/07/2013 - 08:26
neilprosser
Offline
Joined: 10/23/2012
Groups: None

I've just rebalanced my eight-node cluster to take one node out, upgrade it to have four CPUs and then add it back in.

Upon adding it back in the seven machines (which all still have two CPUs) that were originally in the cluster are reporting timeouts.

Our healthcheck is failing to talk to Couchbase with the following error message (our healthcheck consists of getting a random GUID):

Timed out waiting to add Cmd: 0 Opaque: 28838565 Key: f000880b-9bc2-428a-879f-50044b07c6e5(max wait=10000ms)

All nodes are reporting the correct number of items. The usual number of gets and sets are being carried out against the cluster but from the Web-UI it appears that none are getting through.

Has anyone seen the Java SDK fail to pick up changes to the cluster following a rebalance?

Top
  • Login or register to post comments
Sun, 02/10/2013 - 14:39
ingenthr
Offline
Joined: 03/16/2010
Groups:

I don't think we've seen anything like that, but we have fixed a few rebalance related issues in the latest 1.1.2. They were mostly around handling failures though, not planned maintenance. Which client version were you using?

The other thing to check is that under rebalance, if not sized well, sometimes things can get slow under memory pressure. You can usually check the syslog or dmesg for indications of that on the cluster nodes.

Top
  • Login or register to post comments
Sun, 02/10/2013 - 16:04
neilprosser
Offline
Joined: 10/23/2012
Groups: None

Sorry, I should have included that I was using version 1.1.1 of the client.

It's also worth noting that most of the time rebalances occur without anything side-effects. During our testing we've added and removed nodes from the cluster without incident and with performance remaining consistent. Occasionally we have seen both removal and addition of nodes cause problems. It's also strange because an independent service maintains communication with the cluster using the same timeout settings but requesting different (higher or lower depending on which service fails) load from it. Each time we have seen this sort of thing occur simply restarting (and therefore forcing a reconnection from) the services that are having trouble connecting to the altered cluster solves the problem.

I'll upgrade to 1.1.2 and see whether is happens again. I'll probably end up getting four CPUs put into each node since I missed that four was the recommended number in the documentation when originally setting up the cluster.

If this (or similar) does happen again is there any particular information that it would be helpful for me to get hold of for diagnostic purposes? I'm still getting familiar with the information that Couchbase produces so anything that makes it easier for people to work out what's happening would be great.

Top
  • Login or register to post comments
Sun, 02/10/2013 - 21:50
ingenthr
Offline
Joined: 03/16/2010
Groups:

The only thing I can think of is that the system is under load during the rebalance.

One thing that would be good to gather right now is a cbstats timings from each of the nodes. Then if we see a similar timeout on an add later, we can see if that correlates. This will help direct further efforts.

I should give some context. That waiting to add command is to the input queue, so that indicates memory pressure on the client side. You can increase the size of the input queue at the cost of additional memory on the client side or yo can increase the time that thread will block.

I don't think this is the primary issue though since 10 seconds is quite high. I don't know what your workload is, but it could be that the client is backed up with fetches from disk or something along those lines. Gathering cbstats timings should help identify where the problem is.

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker