Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.7.x

"I'm not responsible for this vbucket" errors after membase server reboot

3 replies [Last post]
  • Login or register to post comments
Mon, 10/03/2011 - 10:20
kevnord
Offline
Joined: 10/19/2010
Groups: None

Greetings,
Last week we ran into a lot of issues when one of our Windows Membase machines blue screen'd and rebooted. The node took ~45mins to load the 4GB bucket from disk. During the time it took for the machine to blue-screen, boot-up, and reload our client apps ran into issues and IIS pegged the CPU of the client machines.

System overview.
4 Client (Web App) servers using Enyim's .NET Membase client v2.11 - Windows Server 2008 R2 64bit
4 Membase v1.7.1 servers - Windows Server 2008 (not R2) 64bit
MembaseCluster: v1.7.1 Membase bucket type, Replication (1 backup)

Our client app was logging a lot of errors like the following:
NOT_MY_VBUCKET: I'm not responsible for this vbucket

Is this an expected error?

We did not fail over the machine that rebooted since it came back up relatively fast. We did not know that loading the bucket back into memory would take so long and cause so much disruption.

What steps should we have taken? Should we have failed over immediately? Should the bad server have been removed and then re-added to the cluster? If anyone can provide the best steps to take when a server fails, that would be fantastic.

Thanks
Kevin

Top
  • Login or register to post comments
Mon, 10/03/2011 - 14:19
alex
Offline
Joined: 08/29/2011
Groups: None

Hi Kevin,

NOT_MY_VBUCKET is the expected response if that server node is warming up. The map that the node uses to determine the vbuckets it is responsible for, lives in the sqlite database along with the keys/metadata/values that are being warmed up. The node will need to completely finish warming up before it can start server keys for any of the vbuckets that it is responsible for - it will also need to completely finish warming up before it can reference the vbucket map to understand what vbuckets it is responsible for in the first place - hence the error you are seeing.

In your scenario, after determining the node was taking too long to warm up, you would have been better off stopping the restart, doing a failover and re-adding the downed node/rebalancing afterwards.

You are probably wondering "how do I know that my node is taking to long to warm up?" or "how long will my node take to warm up?" .

You can use the info below to determine whether a warmup or a failover makes more sense.

To gauge warm up time for a given node, we generally look at a few metrics within the logs to get a sense of the rate at which Membase can process the items you have on disk. We look at:
curr_items - The number of items currently active on this node.
curr_items_tot - Total number of items this node knows about(active and replica)
ep_warmed_up - Number of items retrieved from disk - this should increase during warmup.
ep_warmup_oom - Number of out of memory exceptions received from the server while loading data into RAM
ep_warmup_thread - The status of the warmup thread.
ep_warmup_time - How long the warmup thread was running for in microseconds.

[root@localhost bin]# /opt/membase/bin/mbstats localhost:11210 all | egrep 'warm|curr'
curr_connections: 9
curr_items: 52287
curr_items_tot: 104051
ep_store_max_concurrency: 10
ep_warmed_up: 104051
ep_warmup: true
ep_warmup_dups: 0
ep_warmup_oom: 0
ep_warmup_thread: complete
ep_warmup_time: 1964924
vb_active_curr_items: 52287
vb_pending_curr_items: 0
vb_replica_curr_items: 51764

we can see that the warmup thread took 1.9 seconds and warmed up 104k items 52k of which are active. Usually we will divide the number of items warmed up by the warmup time to see the rate at which the server is processing items. Using this we can get a sense for how long it should take to fully warm up the node based on the number of total items.

hope this helps.

-Alex.

Top
  • Login or register to post comments
Mon, 10/03/2011 - 14:53
kevnord
Offline
Joined: 10/19/2010
Groups: None

Hi Alex,
Thanks for the extremely helpful reply! What steps would you suggest be taken when a machine needs to be rebooted to apply Windows Updates, etc. Should we Remove the server, rebalance, perform the necessary maintenance, reboot, add server back to cluster, and rebalance? Does that sound like the best approach?

Thanks again for your insight. It's much appreciated!
-k

Top
  • Login or register to post comments
Mon, 10/03/2011 - 15:49
alex
Offline
Joined: 08/29/2011
Groups: None

Hi Again Kevin,

not a problem, we're here to help.

For scheduled maintenance, the steps you have outlined are exactly what we would recommend.

thanks

-Alex.

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker