"I'm not responsible for this vbucket" errors after membase server reboot
Last week we ran into a lot of issues when one of our Windows Membase machines blue screen'd and rebooted. The node took ~45mins to load the 4GB bucket from disk. During the time it took for the machine to blue-screen, boot-up, and reload our client apps ran into issues and IIS pegged the CPU of the client machines.
4 Client (Web App) servers using Enyim's .NET Membase client v2.11 - Windows Server 2008 R2 64bit
4 Membase v1.7.1 servers - Windows Server 2008 (not R2) 64bit
MembaseCluster: v1.7.1 Membase bucket type, Replication (1 backup)
Our client app was logging a lot of errors like the following:
NOT_MY_VBUCKET: I'm not responsible for this vbucket
Is this an expected error?
We did not fail over the machine that rebooted since it came back up relatively fast. We did not know that loading the bucket back into memory would take so long and cause so much disruption.
What steps should we have taken? Should we have failed over immediately? Should the bad server have been removed and then re-added to the cluster? If anyone can provide the best steps to take when a server fails, that would be fantastic.