Details
Description
Scenario:
- 10 node cluster with build 1942
- Rebalance out 5 nodes (completed successfully)
- Cluster right now: 5 nodes
- Add 5 nodes (with build 1944) and remove 3 nodes.
- Hit rebalance.
- Rebalance failed with reason:
Rebalance exited with reason {badmatch,
[{<0.26283.119>,
{{badmatch,{error,emfile}},
[{ns_replicas_builder_utils,
kill_a_bunch_of_tap_names,3},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}}]}
- Tried rebalance again, but failed repetitively:
Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,
1},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[<0.11023.120>,
{shutdown_replicator,
'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'},
infinity]}}
Will upload logs from one of the nodes in the cluster present in the cluster during the time of the rebalance failures, shortly.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Noticed this on one of the nodes being rebalanced out:
Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'.
So failed over the node and tried rebalancing, rebalancing still failed.
So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded.
- 10 node cluster with build 1942
- Rebalance out 5 nodes (completed successfully)
- Cluster right now: 5 nodes
- Add 5 nodes (with build 1944) and remove 3 nodes.
- Hit rebalance.
- Rebalance failed with reason:
Rebalance exited with reason {badmatch,
[{<0.26283.119>,
{{badmatch,{error,emfile}},
[{ns_replicas_builder_utils,
kill_a_bunch_of_tap_names,3},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}}]}
- Tried rebalance again, but failed repetitively:
Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,
1},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[<0.11023.120>,
{shutdown_replicator,
'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'},
infinity]}}
Will upload logs from one of the nodes in the cluster present in the cluster during the time of the rebalance failures, shortly.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Noticed this on one of the nodes being rebalanced out:
Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'.
So failed over the node and tried rebalancing, rebalancing still failed.
So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded.
Activity
Abhinav Dangeti
made changes -
| Field | Original Value | New Value |
|---|---|---|
| Description |
Scenario:
- 10 node cluster with build 1942 - Rebalance out 5 nodes (completed successfully) - Cluster right now: 5 nodes - Add 5 nodes (with build 1944) and remove 3 nodes. - Hit rebalance. - Rebalance failed with reason: Rebalance exited with reason {badmatch, [{<0.26283.119>, {{badmatch,{error,emfile}}, [{ns_replicas_builder_utils, kill_a_bunch_of_tap_names,3}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}]} - Tried rebalance again, but failed repetitively: Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]}, [{misc,sync_shutdown_many_i_am_trapping_exits, 1}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}, {gen_server,call, [<0.11023.120>, {shutdown_replicator, 'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'}, infinity]}} Will upload logs from one of the nodes in the cluster during the time of the rebalance failures shortly. |
Scenario:
- 10 node cluster with build 1942 - Rebalance out 5 nodes (completed successfully) - Cluster right now: 5 nodes - Add 5 nodes (with build 1944) and remove 3 nodes. - Hit rebalance. - Rebalance failed with reason: Rebalance exited with reason {badmatch, [{<0.26283.119>, {{badmatch,{error,emfile}}, [{ns_replicas_builder_utils, kill_a_bunch_of_tap_names,3}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}]} - Tried rebalance again, but failed repetitively: Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]}, [{misc,sync_shutdown_many_i_am_trapping_exits, 1}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}, {gen_server,call, [<0.11023.120>, {shutdown_replicator, 'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'}, infinity]}} Will upload logs from one of the nodes in the cluster during the time of the rebalance failures shortly. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Noticed this on one of the nodes being rebalanced out: Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'. So failed over the node and tried rebalancing, rebalancing still failed. So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded. |
Abhinav Dangeti
made changes -
| Summary | Rebalance operation failed repetitively while trying to rebalance in 5 nodes and rebalance out 3 nodes on a 5 node cluster | Rebalance operation failed repetitively while trying to rebalance in 5 nodes and rebalance out 3 nodes on a 5 node cluster, reason possibly because: "Unable to listen" to one of the nodes that was being rebalanced out. |
Abhinav Dangeti
made changes -
| Description |
Scenario:
- 10 node cluster with build 1942 - Rebalance out 5 nodes (completed successfully) - Cluster right now: 5 nodes - Add 5 nodes (with build 1944) and remove 3 nodes. - Hit rebalance. - Rebalance failed with reason: Rebalance exited with reason {badmatch, [{<0.26283.119>, {{badmatch,{error,emfile}}, [{ns_replicas_builder_utils, kill_a_bunch_of_tap_names,3}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}]} - Tried rebalance again, but failed repetitively: Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]}, [{misc,sync_shutdown_many_i_am_trapping_exits, 1}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}, {gen_server,call, [<0.11023.120>, {shutdown_replicator, 'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'}, infinity]}} Will upload logs from one of the nodes in the cluster during the time of the rebalance failures shortly. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Noticed this on one of the nodes being rebalanced out: Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'. So failed over the node and tried rebalancing, rebalancing still failed. So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded. |
Scenario:
- 10 node cluster with build 1942 - Rebalance out 5 nodes (completed successfully) - Cluster right now: 5 nodes - Add 5 nodes (with build 1944) and remove 3 nodes. - Hit rebalance. - Rebalance failed with reason: Rebalance exited with reason {badmatch, [{<0.26283.119>, {{badmatch,{error,emfile}}, [{ns_replicas_builder_utils, kill_a_bunch_of_tap_names,3}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}}]} - Tried rebalance again, but failed repetitively: Rebalance exited with reason {{{badmatch,[{<18058.14511.0>,noproc}]}, [{misc,sync_shutdown_many_i_am_trapping_exits, 1}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]}, {gen_server,call, [<0.11023.120>, {shutdown_replicator, 'ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com'}, infinity]}} Will upload logs from one of the nodes in the cluster present in the cluster during the time of the rebalance failures, shortly. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Noticed this on one of the nodes being rebalanced out: Unable to listen on 'ns_1@ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com'. So failed over the node and tried rebalancing, rebalancing still failed. So added that node back, and did not involve that particular node in the rebalance operation, rebalance succeeded. |
Abhinav Dangeti
made changes -
| Assignee | Abhinav Dangeti [ abhinav ] |
Ketaki Gangal
made changes -
| Assignee | Abhinav Dangeti [ abhinav ] | Aleksey Kondratenko [ alkondratenko ] |
| Priority | Major [ 3 ] | Critical [ 2 ] |
| Component/s | cross-datacenter-replication [ 10136 ] | |
| Component/s | ns_server [ 10019 ] |
Aliaksey Artamonau
made changes -
| Assignee | Aleksey Kondratenko [ alkondratenko ] | Abhinav Dangeti [ abhinav ] |
Junyi Xie
made changes -
| Component/s | cross-datacenter-replication [ 10136 ] |
Steve Yen
made changes -
| Assignee | Abhinav Dangeti [ abhinav ] | Aleksey Kondratenko [ alkondratenko ] |
Aleksey Kondratenko
made changes -
| Status | Open [ 1 ] | Resolved [ 5 ] |
| Resolution | Duplicate [ 3 ] |
Ketaki Gangal
made changes -
| Resolution | Duplicate [ 3 ] | |
| Status | Resolved [ 5 ] | Reopened [ 4 ] |
| Assignee | Aleksey Kondratenko [ alkondratenko ] | Abhinav Dangeti [ abhinav ] |
Steve Yen
made changes -
| Status | Reopened [ 4 ] | Resolved [ 5 ] |
| Resolution | Fixed [ 1 ] |
Farshid Ghods
made changes -
| Status | Resolved [ 5 ] | Closed [ 6 ] |