Details
Description
Setup
1.Cluster(nodes 94, 95) has 25.1M items per bucket.
2. Mutate items with much large value, causing fragmentation.
3. Node 94 is in heavy swap - 84%.
4. Resident ratio on node 95 has dropped to < 1 percent.
5. Restart node 95.
6. Issue rebalance - add node 97.
7. Stop rebalance
Seeing the following output messages
Port server moxi on node 'ns_1@10.3.2.97' exited with status 137. Restarting. Messages: 2012-06-20 16:12:14: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13)
2012-06-20 16:12:14: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8)
Port server memcached on node 'ns_1@10.3.2.97' exited with status 137. Restarting. Messages: TAP (Producer) eq_tapq:rebalance_483 - Clear the tap queues by force
On re-issuing rebalance, with remove node94. Rebalance fails with
Rebalance exited with reason {wait_for_memcached_failed,"bucket2",
['ns_1@10.3.2.97']}
Attaching the logs from all the nodes https://s3.amazonaws.com/bugdb/jira/bug-cluster-swap/bug.tar
Attached the current screenshot
The live cluster can be accessed at http://10.3.2.94:8091/index.html#sec=log&serversTab=0
1.Cluster(nodes 94, 95) has 25.1M items per bucket.
2. Mutate items with much large value, causing fragmentation.
3. Node 94 is in heavy swap - 84%.
4. Resident ratio on node 95 has dropped to < 1 percent.
5. Restart node 95.
6. Issue rebalance - add node 97.
7. Stop rebalance
Seeing the following output messages
Port server moxi on node 'ns_1@10.3.2.97' exited with status 137. Restarting. Messages: 2012-06-20 16:12:14: (cproxy_config.c.317) env: MOXI_SASL_PLAIN_USR (13)
2012-06-20 16:12:14: (cproxy_config.c.326) env: MOXI_SASL_PLAIN_PWD (8)
Port server memcached on node 'ns_1@10.3.2.97' exited with status 137. Restarting. Messages: TAP (Producer) eq_tapq:rebalance_483 - Clear the tap queues by force
On re-issuing rebalance, with remove node94. Rebalance fails with
Rebalance exited with reason {wait_for_memcached_failed,"bucket2",
['ns_1@10.3.2.97']}
Attaching the logs from all the nodes https://s3.amazonaws.com/bugdb/jira/bug-cluster-swap/bug.tar
Attached the current screenshot
The live cluster can be accessed at http://10.3.2.94:8091/index.html#sec=log&serversTab=0
[ns_server:error] [2012-06-20 18:29:32] [ns_1@10.3.2.95:ns_doctor:ns_doctor:update_status:154] The following buckets became not ready on node 'ns_1@10.3.2.97': ["bucket1",
"bucket2"], those of them are active ["bucket1",
"bucket2"]
[error_logger:error] [2012-06-20 18:29:28] [ns_1@10.3.2.95:error_logger:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor: {local,'ns_vbm_sup-bucket2'}
Context: shutdown
Reason: reached_max_restart_intensity
Offender: [{pid,<0.5948.0>},