Membase 1.7 Rebalance error
Unable to rebalance a membase cluster.
Error message :- Rebalance exited with reason {mover_failed,{badmatch,true}}
(repeated 2 times)
Membase configuration :- 2 servers in the cluster.
One server was taken down and had more RAM added to it. Then put back into the cluster
We have a Membase default bucket and two Memcache buckets on different ports on two nodes. However rebalancing stops at 68.5% with 1.7.1. I tried it several times without any luck, it simply get stuck and doesn't continue.
The only thing that seems to work is to recreate the buckets on the new server, add the existing server from there and start rebalancing. However this means complete data loss :-(
Yes, I wouldn't recommend that as a "workaround" as it's not going to do what you want...
When it stops at 68.5%, does it eventually timeout with an error message?
With a lot of data, it would be expected to take quite some time. However, all your data should still be accessible throughout this process, so it doesn't really matter how long it takes.
Can you see any CPU or disk load going on? When you look at the graphs in the UI, do you see data "moving" (TAP and disk queue sections)?
Perry
I have same problem as Phoenix in version 1.7.1 with rebalance.
There are no data moves and graphs are clear.
I have 3 server nodes, 1 lose, failover, add back, rebalance => stop on 5%, next time on 16% and now 0%.
Soo I remove server(lost data) and add as new, but rebalance don't work. Stop on 5.2% and no moves on graphs.
Ondrej Prochazka
You will probably want to look into updating to 1.7.2. Please see the download and release notes associated with that release.
We already tried 1.7.2 in some installations but we face similar problems. The rebalance simply get stuck. You can stop and try to rebalance again and again and sometimes you are lucky. But this is definitely not they way it should be!
Rebalancing is currently the most critical issue we have in our setups.
Just a bit of background on this...
To make rebalance more efficient between 1.6 and 1.7 we enhanced the protocol which was being used to transfer the data. That has largely been successful, but did have some issues. 1.7.2 delivers fixes for most of these issues and we'd aimed to fix all of them, but at least one was found after 1.7.2. We're definitely aware of it and are working on more updates.
The good news is that the rebalance may get stuck, but it can always be retried and should eventually complete. The nature of this issue means it's better to rebalance in new nodes if you can, rather than rebalance out and shrink a cluster.
We've seen some recent rebalancing issues with 1.7.0, can you upgrade those servers to 1.7.1?
You should also be able to retry the rebalance multiple times until it completes...
Perry
Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!