Failure during rebalance

I've just installed couchbase 2.1.0 on a couple of machines and tried to create a cluster out of them. The "Server Nodes" page shows a fail over warning that says that rebalancing is required because not all data is replicated. When I try to rebalance, it fails after a few seconds and it tells me to check the logs for more information.

Here's what I have in the logs:

15:28:59 - Mon Jul 1, 2013 Starting rebalance, KeepNodes = ['ns_1@odadb01.cern.ch',
'ns_1@odadb02.cern.ch'], EjectNodes = []
15:28:59 - Mon Jul 1, 2013 Started rebalancing bucket sync_gateway
15:29:00 - Mon Jul 1, 2013 Bucket "sync_gateway" rebalance appears to be swap rebalance
15:29:00 - Mon Jul 1, 2013 Started rebalancing bucket oda
15:29:01 - Mon Jul 1, 2013 Bucket "oda" rebalance appears to be swap rebalance
15:29:01 - Mon Jul 1, 2013 Started rebalancing bucket default
15:29:01 - Mon Jul 1, 2013 Bucket "default" rebalance appears to be swap rebalance
15:29:20 - Mon Jul 1, 2013 <0.5412.4> exited with {unexpected_exit,
{'EXIT',<0.5419.4>,
{badmatch,
[{'EXIT',
{{badmatch,{error,closed}},
{gen_server,call,
[<12888.10125.3>,had_backfill,30000]}}}]}}}
15:29:20 - Mon Jul 1, 2013 Rebalance exited with reason {unexpected_exit,
{'EXIT',<0.5419.4>,
{badmatch,
[{'EXIT',
{{badmatch,{error,closed}},
{gen_server,call,
[<12888.10125.3>,had_backfill,
30000]}}}]}}}

This doesn't really help me too much. Could somebody shed some light on this? This happens even when I delete all my buckets and the default one is empty.

Hello,

Let's do a 2 steps process:
1- Work around the problem
Since you say you can "delete" everything, I invite you do do the following (as I say it is a work around)
a) Uninstall Couchbase
b) Install Couchbase and create a single node without any bucket in it (no default, or sample bucket)
c) Install one or more node, and add them to the cluster
b) do the rebalance
> the rebalance without bucket should be immediate. (this is what I used when I create a cluster from command line with Ansible : see http://blog.couchbase.com/create-couchbase-cluster-with-ansible )

2- Trying to identify the source of the problem
So let's now identify the problem. Can you reproduce the issue and follow the steps documented here:
http://www.couchbase.com/wiki/display/couchbase/Working+with+the+Couchba...

(be sure to run the cb_collectinfo on all nodes)

Send me a PM to give me the location of the logs once uploaded on Couchbase S3 drive.

Regards
Tug
@tgrall

Hello Tug,
I'm trying to do what you suggest as a workaround, but the cluster-init command fails:

/opt/couchbase/bin/couchbase-cli cluster-init -c localhost:8091 --cluster-init-username=xxx --cluster-init-password=yyy --cluster-init-port=8092 --cluster-init-ramsize=5120
ERROR: bootstrap failed (404) Object Not Found
{u'reason': u'missing', u'error': u'not_found'}

This is on a fresh install. I tried to google the error, but I didn't find any information.

Cheers,
Alex

Hello Tug,

I saw somewhere that I have to do a node-init first, although your instructions don't mention it. It doesn't help, though:

[odadb01] ~ > /opt/couchbase/bin/couchbase-cli node-init -c odadb01:8091 --node-init-data-path=/data --node-init-index-path=/data
SUCCESS: init odadb01
[odadb01] ~ > /opt/couchbase/bin/couchbase-cli cluster-init -c odadb01:8091 --cluster-init-username=xxx --cluster-init-password=yyy --cluster-init-port=8092 --cluster-init-ramsize=5120
ERROR: bootstrap failed (404) Object Not Found
{u'reason': u'missing', u'error': u'not_found'}
[odadb01] ~ > service couchbase-server status
couchbase-server is not running

It seems like 'erl' is crashing, or at least there are some crash dumps in /opt/couchbase/var/lib/couchbase/ after running cluster-init.

I don't have an enterprise support contract, can I still upload the data you asked for?

Cheers,
Alex

1 Answer

« Back to question.

What are the specs of the machine you are running on? Memory , HD, CPU
Do you have problems re-balancing on adding new nodes or removing nodes?

They're VMs, so they're not amazing but they do meet the minimums: 4 CPUs, 8GB of RAM and 80 GB of disk. They have very little data on them, 35MB according to the admin interface.

I've never been able to rebalance successfully. Anyway, what I'm doing is essentially adding a new node.