Details
Description
Scenario:
- 10 node cluster with build 1942
- Rebalance out 5 nodes (completed successfully)
- Cluster right now: 5 nodes
- Add 5 nodes (with build 1944) and remove 3 nodes.
- Hit rebalance.
- Rebalance failed with reason:
Rebalance exited with reason {badmatch,
[{<0.26283.119>,
{{badmatch,{error,emfile}},
[{ns_replicas_builder_utils,
kill_a_bunch_of_tap_names,3},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}}]}
- Tried rebalance again, but failed repetitively:
Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,
1},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[<0.11023.120>,
{shutdown_replicator,
'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'},
infinity]}}
Will upload logs from one of the nodes in the cluster present in the cluster during the time of the rebalance failures, shortly.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Noticed this on one of the nodes being rebalanced out:
Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'.
So failed over the node and tried rebalancing, rebalancing still failed.
So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded.
- 10 node cluster with build 1942
- Rebalance out 5 nodes (completed successfully)
- Cluster right now: 5 nodes
- Add 5 nodes (with build 1944) and remove 3 nodes.
- Hit rebalance.
- Rebalance failed with reason:
Rebalance exited with reason {badmatch,
[{<0.26283.119>,
{{badmatch,{error,emfile}},
[{ns_replicas_builder_utils,
kill_a_bunch_of_tap_names,3},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}}]}
- Tried rebalance again, but failed repetitively:
Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,
1},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[<0.11023.120>,
{shutdown_replicator,
'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'},
infinity]}}
Will upload logs from one of the nodes in the cluster present in the cluster during the time of the rebalance failures, shortly.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Noticed this on one of the nodes being rebalanced out:
Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'.
So failed over the node and tried rebalancing, rebalancing still failed.
So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded.
Activity
- All
- Comments
- Work Log
- History
- Activity
- Gerrit Reviews
Abhinav Dangeti
made changes -
| Field | Original Value | New Value |
|---|---|---|
| Description |
Scenario:
- 10 node cluster with build 1942 - Rebalance out 5 nodes (completed successfully) - Cluster right now: 5 nodes - Add 5 nodes (with build 1944) and remove 3 nodes. - Hit rebalance. - Rebalance failed with reason: Rebalance exited with reason {badmatch, [{<0.26283.119>, {{badmatch,{error,emfile}}, [{ns_replicas_builder_utils, kill_a_bunch_of_tap_names,3}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}]} - Tried rebalance again, but failed repetitively: Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]}, [{misc,sync_shutdown_many_i_am_trapping_exits, 1}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}, {gen_server,call, [<0.11023.120>, {shutdown_replicator, 'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'}, infinity]}} Will upload logs from one of the nodes in the cluster during the time of the rebalance failures shortly. |
Scenario:
- 10 node cluster with build 1942 - Rebalance out 5 nodes (completed successfully) - Cluster right now: 5 nodes - Add 5 nodes (with build 1944) and remove 3 nodes. - Hit rebalance. - Rebalance failed with reason: Rebalance exited with reason {badmatch, [{<0.26283.119>, {{badmatch,{error,emfile}}, [{ns_replicas_builder_utils, kill_a_bunch_of_tap_names,3}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}]} - Tried rebalance again, but failed repetitively: Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]}, [{misc,sync_shutdown_many_i_am_trapping_exits, 1}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}, {gen_server,call, [<0.11023.120>, {shutdown_replicator, 'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'}, infinity]}} Will upload logs from one of the nodes in the cluster during the time of the rebalance failures shortly. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Noticed this on one of the nodes being rebalanced out: Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'. So failed over the node and tried rebalancing, rebalancing still failed. So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded. |
Abhinav Dangeti
made changes -
| Summary | Rebalance operation failed repetitively while trying to rebalance in 5 nodes and rebalance out 3 nodes on a 5 node cluster | Rebalance operation failed repetitively while trying to rebalance in 5 nodes and rebalance out 3 nodes on a 5 node cluster, reason possibly because: "Unable to listen" to one of the nodes that was being rebalanced out. |
Abhinav Dangeti
made changes -
| Description |
Scenario:
- 10 node cluster with build 1942 - Rebalance out 5 nodes (completed successfully) - Cluster right now: 5 nodes - Add 5 nodes (with build 1944) and remove 3 nodes. - Hit rebalance. - Rebalance failed with reason: Rebalance exited with reason {badmatch, [{<0.26283.119>, {{badmatch,{error,emfile}}, [{ns_replicas_builder_utils, kill_a_bunch_of_tap_names,3}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}]} - Tried rebalance again, but failed repetitively: Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]}, [{misc,sync_shutdown_many_i_am_trapping_exits, 1}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}, {gen_server,call, [<0.11023.120>, {shutdown_replicator, 'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'}, infinity]}} Will upload logs from one of the nodes in the cluster during the time of the rebalance failures shortly. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Noticed this on one of the nodes being rebalanced out: Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'. So failed over the node and tried rebalancing, rebalancing still failed. So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded. |
Scenario:
- 10 node cluster with build 1942 - Rebalance out 5 nodes (completed successfully) - Cluster right now: 5 nodes - Add 5 nodes (with build 1944) and remove 3 nodes. - Hit rebalance. - Rebalance failed with reason: Rebalance exited with reason {badmatch, [{<0.26283.119>, {{badmatch,{error,emfile}}, [{ns_replicas_builder_utils, kill_a_bunch_of_tap_names,3}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}]} - Tried rebalance again, but failed repetitively: Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]}, [{misc,sync_shutdown_many_i_am_trapping_exits, 1}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}, {gen_server,call, [<0.11023.120>, {shutdown_replicator, 'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'}, infinity]}} Will upload logs from one of the nodes in the cluster present in the cluster during the time of the rebalance failures, shortly. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Noticed this on one of the nodes being rebalanced out: Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'. So failed over the node and tried rebalancing, rebalancing still failed. So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded. |
Abhinav Dangeti
made changes -
| Assignee | Abhinav Dangeti [ abhinav ] |
Ketaki Gangal
made changes -
| Assignee | Abhinav Dangeti [ abhinav ] | Aleksey Kondratenko [ alkondratenko ] |
| Priority | Major [ 3 ] | Critical [ 2 ] |
| Component/s | cross-datacenter-replication [ 10136 ] | |
| Component/s | ns_server [ 10019 ] |
Aliaksey Artamonau
made changes -
| Assignee | Aleksey Kondratenko [ alkondratenko ] | Abhinav Dangeti [ abhinav ] |
Junyi Xie
made changes -
| Component/s | cross-datacenter-replication [ 10136 ] |
Steve Yen
made changes -
| Assignee | Abhinav Dangeti [ abhinav ] | Aleksey Kondratenko [ alkondratenko ] |
Aleksey Kondratenko
made changes -
| Status | Open [ 1 ] | Resolved [ 5 ] |
| Resolution | Duplicate [ 3 ] |
Ketaki Gangal
made changes -
| Resolution | Duplicate [ 3 ] | |
| Status | Resolved [ 5 ] | Reopened [ 4 ] |
| Assignee | Aleksey Kondratenko [ alkondratenko ] | Abhinav Dangeti [ abhinav ] |
Steve Yen
made changes -
| Status | Reopened [ 4 ] | Resolved [ 5 ] |
| Resolution | Fixed [ 1 ] |
Farshid Ghods
made changes -
| Status | Resolved [ 5 ] | Closed [ 6 ] |