Details
Description
Cluster information:
- 8 centos 6.2 64bit server with 4 cores CPU
- Each server has 32 GB RAM and 400 GB SSD disk.
- SSD disk format ext4 on /data
- Each server has its own SSD drive, no disk sharing with other server.
- Create cluster with 6 nodes installed couchbase server 2.0.0-1862
- Cluster has 2 buckets, default (12GB) and saslbucket (12GB).
- Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
- Enable consistent view on cluster (default)
- Change value of erlang in couchbase-server from +A 16 to +S 128:128
10.6.2.37
10.6.2.38
10.6.2.39
10.6.2.40
10.6.2.42
10.6.2.43
* Load 15 million items to each bucket. Each key has size from 512 bytes to 1024 bytes
* Queries all 4 views from 2 docs
* Mutate 15 million items with key size from 1500 to 1024 bytes
* Do swap rebalance, add node 44, 45 and remove node 39, 40
* Rebalance moves some items and hang in hours. Filed bugMB-6953
* Try to stop rebalance but failed. Will re-open bugMB-6707.
* Stop couchbase server at node 37. Node 37 down but rebalance does not stop
* Go to node 38 and click stop rebalance. Rebalance stop. Then restart couchbase server on node 37.
* When node 37 up in a while, rebalance cluster again. Rebalance failed in few minutes with error:
Rebalance exited with reason {{{{badmatch,
{error,
{error,
<<"Partition 854 not in active nor passive set">>}}},
[{capi_set_view_manager,handle_call,3},
{gen_server,handle_msg,5},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
['capi_set_view_manager-saslbucket',
{wait_index_updated,854},
infinity]}},
{gen_server,call,
[{'janitor_agent-saslbucket','ns_1@10.6.2.37'},
{if_rebalance,<0.8171.289>,
{wait_index_updated,513}},
infinity]}}
ns_orchestrator002
ns_1@10.6.2.38
22:52:21 - Wed Oct 17, 2012
Server error during processing: ["web request failed",
{path,
"/pools/default/buckets/default/statsDirectory"},
{type,exit},
{what,
{noproc,
{gen_server,call,
['capi_set_view_manager-default',
{foreach_doc,
#Fun<capi_ddoc_replication_srv.1.36030090>},
infinity]}}},
{trace,
[{gen_server,call,3},
{capi_ddoc_replication_srv,
foreach_live_ddoc_id,2},
{capi_ddoc_replication_srv,fetch_ddoc_ids,
1},
{menelaus_stats,
couchbase_view_stats_descriptions,1},
{menelaus_stats,membase_stats_description,
1},
{menelaus_stats,serve_stats_directory,3},
{menelaus_web_buckets,
checking_bucket_access,4},
{menelaus_web,loop,3}]}]
menelaus_web019
ns_1@10.6.2.45
22:52:19 - Wed Oct 17, 2012
<0.8771.289> exited with {{{{badmatch,
{error,
{error,
<<"Partition 854 not in active nor passive set">>}}},
[{capi_set_view_manager,handle_call,3},
{gen_server,handle_msg,5},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
['capi_set_view_manager-saslbucket',
{wait_index_updated,854},
infinity]}},
{gen_server,call,
[{'janitor_agent-saslbucket','ns_1@10.6.2.37'},
{if_rebalance,<0.8171.289>,
{wait_index_updated,513}},
infinity]}}
ns_vbucket_mover000
ns_1@10.6.2.38
22:52:10 - Wed Oct 17, 2012
* Link to manifest file http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1862-rel.rpm.manifest.xml
* Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201210/8nodes-col-info-1862-reb-failed-Partition-not-in-active-nor-passive-set-20121017-233606.tgz
* This bug is similar with bugMB-6490 but it is marked as fixed
- 8 centos 6.2 64bit server with 4 cores CPU
- Each server has 32 GB RAM and 400 GB SSD disk.
- SSD disk format ext4 on /data
- Each server has its own SSD drive, no disk sharing with other server.
- Create cluster with 6 nodes installed couchbase server 2.0.0-1862
- Cluster has 2 buckets, default (12GB) and saslbucket (12GB).
- Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
- Enable consistent view on cluster (default)
- Change value of erlang in couchbase-server from +A 16 to +S 128:128
10.6.2.37
10.6.2.38
10.6.2.39
10.6.2.40
10.6.2.42
10.6.2.43
* Load 15 million items to each bucket. Each key has size from 512 bytes to 1024 bytes
* Queries all 4 views from 2 docs
* Mutate 15 million items with key size from 1500 to 1024 bytes
* Do swap rebalance, add node 44, 45 and remove node 39, 40
* Rebalance moves some items and hang in hours. Filed bug
* Try to stop rebalance but failed. Will re-open bug
* Stop couchbase server at node 37. Node 37 down but rebalance does not stop
* Go to node 38 and click stop rebalance. Rebalance stop. Then restart couchbase server on node 37.
* When node 37 up in a while, rebalance cluster again. Rebalance failed in few minutes with error:
Rebalance exited with reason {{{{badmatch,
{error,
{error,
<<"Partition 854 not in active nor passive set">>}}},
[{capi_set_view_manager,handle_call,3},
{gen_server,handle_msg,5},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
['capi_set_view_manager-saslbucket',
{wait_index_updated,854},
infinity]}},
{gen_server,call,
[{'janitor_agent-saslbucket','ns_1@10.6.2.37'},
{if_rebalance,<0.8171.289>,
{wait_index_updated,513}},
infinity]}}
ns_orchestrator002
ns_1@10.6.2.38
22:52:21 - Wed Oct 17, 2012
Server error during processing: ["web request failed",
{path,
"/pools/default/buckets/default/statsDirectory"},
{type,exit},
{what,
{noproc,
{gen_server,call,
['capi_set_view_manager-default',
{foreach_doc,
#Fun<capi_ddoc_replication_srv.1.36030090>},
infinity]}}},
{trace,
[{gen_server,call,3},
{capi_ddoc_replication_srv,
foreach_live_ddoc_id,2},
{capi_ddoc_replication_srv,fetch_ddoc_ids,
1},
{menelaus_stats,
couchbase_view_stats_descriptions,1},
{menelaus_stats,membase_stats_description,
1},
{menelaus_stats,serve_stats_directory,3},
{menelaus_web_buckets,
checking_bucket_access,4},
{menelaus_web,loop,3}]}]
menelaus_web019
ns_1@10.6.2.45
22:52:19 - Wed Oct 17, 2012
<0.8771.289> exited with {{{{badmatch,
{error,
{error,
<<"Partition 854 not in active nor passive set">>}}},
[{capi_set_view_manager,handle_call,3},
{gen_server,handle_msg,5},
{gen_server,init_it,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
['capi_set_view_manager-saslbucket',
{wait_index_updated,854},
infinity]}},
{gen_server,call,
[{'janitor_agent-saslbucket','ns_1@10.6.2.37'},
{if_rebalance,<0.8171.289>,
{wait_index_updated,513}},
infinity]}}
ns_vbucket_mover000
ns_1@10.6.2.38
22:52:10 - Wed Oct 17, 2012
* Link to manifest file http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1862-rel.rpm.manifest.xml
* Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201210/8nodes-col-info-1862-reb-failed-Partition-not-in-active-nor-passive-set-20121017-233606.tgz
* This bug is similar with bug