Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.7.x

Could writes cause read timeouts?

6 replies [Last post]
  • Login or register to post comments
Wed, 12/07/2011 - 15:27
dan
Offline
Joined: 01/05/2011
Groups: None

12 node 1.7.1 cluster, 50GB per node allocated to the cluster for a total of 600GB.

Twice a day we load 160mm keys into a membase bucket with replication enabled at ~50k a second. We have reads going at about 2-15k a second. What we find is that our spymemcached clients start timing out requests but only from one server. What doesn't make sense is that reads coming out of a different bucket also start timing out. Network traffic into each node is only 20mbit so it's not a network issue (we have a gigabit network). On the node that has the timeouts memcached is only using 200% CPU (the other 11 nodes are only at around 70%). We have dual hexacore processors with hyperthreading so we aren't anywhere close to maxing out that server. Our dataset fits completely in memory (although for some weird reason its only showing 97% resident rate even though the bucket is only 40% full).

Once writing of the active items finishes across the cluster (replica writes are still going) the issue stops.

Is this just an issue with vbuckets being migrated? If so, why does it affect other buckets when we are only writing into one bucket? Can we adjust the number of read threads to improve performance? We could throttle the writes down to see if that would alleviate the problem but we shouldn't be anywhere close to stressing the cluster.

Thanks,
Dan

Top
  • Login or register to post comments
Wed, 12/07/2011 - 16:12
dan
Offline
Joined: 01/05/2011
Groups: None

Okay, I did some digging. It looks like our node has some issues.

See: http://img822.imageshack.us/img822/1001/72440843.png

The active/replica resident and user data in ram are all way off.

Tomorrow I'll fail the node out and hope the replicas we have are correct. There have been no errors logged or any indication of what might have happened.

Top
  • Login or register to post comments
Wed, 12/07/2011 - 17:38
ingenthr
Offline
Joined: 03/16/2010
Groups:

From that image, I can't necessarily tell if things are off or not, but a misbehaving node can definitely give you timeouts a the app level. Either restarting or rebalancing the node is probably advisable.

Was there anything in the logs?

By the way, you can use mbstats timings to see how long operations are taking at the node's level.

Top
  • Login or register to post comments
Wed, 12/07/2011 - 17:46
dan
Offline
Joined: 01/05/2011
Groups: None

The active/replica resident rate should be 100%, not 67% and -2%

The logs tab in the UI shows nothing wrong.

I won't restart the node since it'll take too long to bring it back in. It'd be faster to just fail it out.

Top
  • Login or register to post comments
Sun, 12/18/2011 - 01:21
ingenthr
Offline
Joined: 03/16/2010
Groups:

Are things behaving better now?

Top
  • Login or register to post comments
Mon, 12/19/2011 - 09:04
dan
Offline
Joined: 01/05/2011
Groups: None

Thanks for the follow up.

The node was actually fine, all the other buckets showed correct information. I ended up deleting the bad bucket and recreating it.

Honestly, every few months we have had to just recreate buckets because they become unstable and we can't figure out why. It wouldn't be an issue if we were using membase as a simple caching layer but we want to use it as an in memory data store and we can't simply rebuild buckets every time things go haywire.

Dan

Top
  • Login or register to post comments
Mon, 12/19/2011 - 22:32
ingenthr
Offline
Joined: 03/16/2010
Groups:

I totally understand, and you shouldn't need to. I'd strongly consider moving to 1.7.2 if you can. There had been a couple of checkpoint related issues that could possibly cause this, but that's not a behavior we've normally seen. Usually it manifests itself in other ways.

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker