Details
Description
Setup
1.Setup a 18 node cluster with 2 buckets- bucket1, bucket2
2. Enable auto-failover
3. Add a new node 126
4. Rebalance
Output
1. Rebalance works fine. But seeing these log messages -
Could not automatically failover node 'ns_1@10.3.121.126<mailto:ns_1@10.3.121.126><mailto:ns_1@10.3.121.126<mailto:ns_1@10.3.121.126>>' because I think rebalance is running auto_failover000 ns_1@10.3.2.104<mailto:ns_1@10.3.2.104><mailto:ns_1@10.3.2.104<mailto:ns_1@10.3.2.104>> 19:32:12 - Sun Jun 17, 2012
Bucket "bucket1" loaded on node 'ns_1@10.3.121.126<mailto:ns_1@10.3.121.126><mailto:ns_1@10.3.121.126<mailto:ns_1@10.3.121.126>>' in 0 seconds. ns_memcached001 ns_1@10.3.121.126<mailto:ns_1@10.3.121.126><mailto:ns_1@10.3.121.126<mailto:ns_1@10.3.121.126>> 19:32:04 - Sun Jun 17, 2012
Started rebalancing bucket bucket2 ns_rebalancer000 ns_1@10.3.2.104<mailto:ns_1@10.3.2.104><mailto:ns_1@10.3.2.104<mailto:ns_1@10.3.2.104>> 19:31:36 - Sun Jun 17, 2012
Starting rebalance, KeepNodes = ['ns_1@10.3.2.85<mailto:ns_1@10.3.2.85><mailto:ns_1@10.3.2.85<mailto:ns_1@10.3.2.85>>','ns_1@10.3.2.86<mailto:ns_1@10.3.2.86><mailto:ns_1@10.3.2.86<mailto:ns_1@10.3.2.86>>',
'ns_1@10.3.2.87<mailto:ns_1@10.3.2.87><mailto:ns_1@10.3.2.87<mailto:ns_1@10.3.2.87>>','ns_1@10.3.2.88<mailto:ns_1@10.3.2.88><mailto:ns_1@10.3.2.88<mailto:ns_1@10.3.2.88>>',
'ns_1@10.3.2.89<mailto:ns_1@10.3.2.89><mailto:ns_1@10.3.2.89<mailto:ns_1@10.3.2.89>>','ns_1@10.3.2.104<mailto:ns_1@10.3.2.104><mailto:ns_1@10.3.2.104<mailto:ns_1@10.3.2.104>>',
'ns_1@10.3.2.105<mailto:ns_1@10.3.2.105><mailto:ns_1@10.3.2.105<mailto:ns_1@10.3.2.105>>','ns_1@10.3.2.106<mailto:ns_1@10.3.2.106><mailto:ns_1@10.3.2.106<mailto:ns_1@10.3.2.106>>',
'ns_1@10.3.2.108<mailto:ns_1@10.3.2.108><mailto:ns_1@10.3.2.108<mailto:ns_1@10.3.2.108>>','ns_1@10.3.2.109<mailto:ns_1@10.3.2.109><mailto:ns_1@10.3.2.109<mailto:ns_1@10.3.2.109>>',
'ns_1@10.3.2.110<mailto:ns_1@10.3.2.110><mailto:ns_1@10.3.2.110<mailto:ns_1@10.3.2.110>>','ns_1@10.3.2.111<mailto:ns_1@10.3.2.111><mailto:ns_1@10.3.2.111<mailto:ns_1@10.3.2.111>>',
'ns_1@10.3.2.112<mailto:ns_1@10.3.2.112><mailto:ns_1@10.3.2.112<mailto:ns_1@10.3.2.112>>','ns_1@10.3.2.113<mailto:ns_1@10.3.2.113><mailto:ns_1@10.3.2.113<mailto:ns_1@10.3.2.113>>',
'ns_1@10.3.2.114<mailto:ns_1@10.3.2.114><mailto:ns_1@10.3.2.114<mailto:ns_1@10.3.2.114>>','ns_1@10.3.2.115<mailto:ns_1@10.3.2.115><mailto:ns_1@10.3.2.115<mailto:ns_1@10.3.2.115>>',
'ns_1@10.3.121.126<mailto:ns_1@10.3.121.126><mailto:ns_1@10.3.121.126<mailto:ns_1@10.3.121.126>>'], EjectNodes = []
Attached are the web-logs and logs from master node-104.
https://s3.amazonaws.com/bugdb/jira/web-log-largeCluster/ns-diag-20120618095246.txt
https://s3.amazonaws.com/bugdb/jira/web-log-largeCluster/10.3.2.104-8091-diag.txt.gz
Other related conversation
I have enabled auto-failover on the large-cluster and every time I rebalance In a node, I get an error message showing " Could not automatically failover node 'ns_1@10.3.121.126<mailto:ns_1@10.3.121.126><mailto:ns_1@10.3.121.126<mailto:ns_1@10.3.121.126>>' because I think rebalance is running" .
The node 126 is newly added and rebalance issued, is this message displayed because the node is not yet ready to join the cluster ?
The rebalance works fine, but I do not understand why is auto-failover attempted in here. Any idea?
No. according to logs at 19:32:04 bucket1 was loaded. Maybe there are some other buckets that are still not ready on this node. May I have logs?
1.Setup a 18 node cluster with 2 buckets- bucket1, bucket2
2. Enable auto-failover
3. Add a new node 126
4. Rebalance
Output
1. Rebalance works fine. But seeing these log messages -
Could not automatically failover node 'ns_1@10.3.121.126<mailto:ns_1@10.3.121.126><mailto:ns_1@10.3.121.126<mailto:ns_1@10.3.121.126>>' because I think rebalance is running auto_failover000 ns_1@10.3.2.104<mailto:ns_1@10.3.2.104><mailto:ns_1@10.3.2.104<mailto:ns_1@10.3.2.104>> 19:32:12 - Sun Jun 17, 2012
Bucket "bucket1" loaded on node 'ns_1@10.3.121.126<mailto:ns_1@10.3.121.126><mailto:ns_1@10.3.121.126<mailto:ns_1@10.3.121.126>>' in 0 seconds. ns_memcached001 ns_1@10.3.121.126<mailto:ns_1@10.3.121.126><mailto:ns_1@10.3.121.126<mailto:ns_1@10.3.121.126>> 19:32:04 - Sun Jun 17, 2012
Started rebalancing bucket bucket2 ns_rebalancer000 ns_1@10.3.2.104<mailto:ns_1@10.3.2.104><mailto:ns_1@10.3.2.104<mailto:ns_1@10.3.2.104>> 19:31:36 - Sun Jun 17, 2012
Starting rebalance, KeepNodes = ['ns_1@10.3.2.85<mailto:ns_1@10.3.2.85><mailto:ns_1@10.3.2.85<mailto:ns_1@10.3.2.85>>','ns_1@10.3.2.86<mailto:ns_1@10.3.2.86><mailto:ns_1@10.3.2.86<mailto:ns_1@10.3.2.86>>',
'ns_1@10.3.2.87<mailto:ns_1@10.3.2.87><mailto:ns_1@10.3.2.87<mailto:ns_1@10.3.2.87>>','ns_1@10.3.2.88<mailto:ns_1@10.3.2.88><mailto:ns_1@10.3.2.88<mailto:ns_1@10.3.2.88>>',
'ns_1@10.3.2.89<mailto:ns_1@10.3.2.89><mailto:ns_1@10.3.2.89<mailto:ns_1@10.3.2.89>>','ns_1@10.3.2.104<mailto:ns_1@10.3.2.104><mailto:ns_1@10.3.2.104<mailto:ns_1@10.3.2.104>>',
'ns_1@10.3.2.105<mailto:ns_1@10.3.2.105><mailto:ns_1@10.3.2.105<mailto:ns_1@10.3.2.105>>','ns_1@10.3.2.106<mailto:ns_1@10.3.2.106><mailto:ns_1@10.3.2.106<mailto:ns_1@10.3.2.106>>',
'ns_1@10.3.2.108<mailto:ns_1@10.3.2.108><mailto:ns_1@10.3.2.108<mailto:ns_1@10.3.2.108>>','ns_1@10.3.2.109<mailto:ns_1@10.3.2.109><mailto:ns_1@10.3.2.109<mailto:ns_1@10.3.2.109>>',
'ns_1@10.3.2.110<mailto:ns_1@10.3.2.110><mailto:ns_1@10.3.2.110<mailto:ns_1@10.3.2.110>>','ns_1@10.3.2.111<mailto:ns_1@10.3.2.111><mailto:ns_1@10.3.2.111<mailto:ns_1@10.3.2.111>>',
'ns_1@10.3.2.112<mailto:ns_1@10.3.2.112><mailto:ns_1@10.3.2.112<mailto:ns_1@10.3.2.112>>','ns_1@10.3.2.113<mailto:ns_1@10.3.2.113><mailto:ns_1@10.3.2.113<mailto:ns_1@10.3.2.113>>',
'ns_1@10.3.2.114<mailto:ns_1@10.3.2.114><mailto:ns_1@10.3.2.114<mailto:ns_1@10.3.2.114>>','ns_1@10.3.2.115<mailto:ns_1@10.3.2.115><mailto:ns_1@10.3.2.115<mailto:ns_1@10.3.2.115>>',
'ns_1@10.3.121.126<mailto:ns_1@10.3.121.126><mailto:ns_1@10.3.121.126<mailto:ns_1@10.3.121.126>>'], EjectNodes = []
Attached are the web-logs and logs from master node-104.
https://s3.amazonaws.com/bugdb/jira/web-log-largeCluster/ns-diag-20120618095246.txt
https://s3.amazonaws.com/bugdb/jira/web-log-largeCluster/10.3.2.104-8091-diag.txt.gz
Other related conversation
I have enabled auto-failover on the large-cluster and every time I rebalance In a node, I get an error message showing " Could not automatically failover node 'ns_1@10.3.121.126<mailto:ns_1@10.3.121.126><mailto:ns_1@10.3.121.126<mailto:ns_1@10.3.121.126>>' because I think rebalance is running" .
The node 126 is newly added and rebalance issued, is this message displayed because the node is not yet ready to join the cluster ?
The rebalance works fine, but I do not understand why is auto-failover attempted in here. Any idea?
No. according to logs at 19:32:04 bucket1 was loaded. Maybe there are some other buckets that are still not ready on this node. May I have logs?
MB-5602: consider buckets' servers list when computing down nodes (Revision 72b674c47e386dac5a28ecaadfea2f37c3d14133)Result = SUCCESS
Farshid Ghods :
Files :
* src/auto_failover.erl