Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Major
-
Resolution: Duplicate
-
Affects Version/s: 2.0-beta
-
Fix Version/s: 2.0
-
Component/s: couchbase-bucket
-
Security Level: Public
-
Labels:
-
Environment:centos 6.2 64bit
Description
Install couchbase server 2.0.0-1645 on 11 nodes
10.3.121.13
10.3.121.14
10.3.121.15
10.3.121.16
10.3.121.17
10.3.121.20
10.3.121.22
10.3.121.26
10.3.121.24
10.3.121.25
10.3.121.23
Create cluster 10 nodes
Create 2 buckets, default and saslbucket
Create 1 docs with 2 views per doc for each bucket
Load 7+ million items to default bucket and 1+ million items to saslbucket
Maintain load about 12 K ops on default bucket and 3 K ops on saslbucket
Continuously querying views on both buckets
Rebalance in and out node
Then do swap rebalance by add 3 nodes ( 24, 25 and 26) to cluster and remove 3 nodes (20, 22 and 23) out of cluster
After 3 minutes rebalance, rebalance failed with error
2012-08-29 01:03:09.578 ns_orchestrator:4:info:message(ns_1@10.3.121.13) - Starting rebalance, KeepNodes = ['ns_1@10.3.121.13','ns_1@10.3.121.14',
'ns_1@10.3.121.15','ns_1@10.3.121.16',
'ns_1@10.3.121.17','ns_1@10.3.121.24',
'ns_1@10.3.121.25','ns_1@10.3.121.26'], EjectNodes = ['ns_1@10.3.121.20',
'ns_1@10.3.121.22',
'ns_1@10.3.121.23']
2012-08-29 01:03:09.893 ns_storage_conf:0:info:message(ns_1@10.3.121.24) - Deleting old data files of bucket "saslbucket"
2012-08-29 01:03:09.893 ns_storage_conf:0:info:message(ns_1@10.3.121.24) - Deleting old data files of bucket "default"
2012-08-29 01:03:09.896 ns_storage_conf:0:info:message(ns_1@10.3.121.26) - Deleting old data files of bucket "saslbucket"
2012-08-29 01:03:09.898 ns_storage_conf:0:info:message(ns_1@10.3.121.26) - Deleting old data files of bucket "default"
2012-08-29 01:03:09.942 ns_storage_conf:0:info:message(ns_1@10.3.121.25) - Deleting old data files of bucket "saslbucket"
2012-08-29 01:03:09.942 ns_storage_conf:0:info:message(ns_1@10.3.121.25) - Deleting old data files of bucket "default"
2012-08-29 01:03:13.771 ns_rebalancer:0:info:message(ns_1@10.3.121.13) - Started rebalancing bucket saslbucket
2012-08-29 01:03:14.794 ns_memcached:1:info:message(ns_1@10.3.121.26) - Bucket "saslbucket" loaded on node 'ns_1@10.3.121.26' in 0 seconds.
2012-08-29 01:03:14.820 ns_memcached:1:info:message(ns_1@10.3.121.24) - Bucket "saslbucket" loaded on node 'ns_1@10.3.121.24' in 0 seconds.
2012-08-29 01:03:14.831 ns_memcached:1:info:message(ns_1@10.3.121.25) - Bucket "saslbucket" loaded on node 'ns_1@10.3.121.25' in 0 seconds.
2012-08-29 01:03:16.674 ns_vbucket_mover:0:info:message(ns_1@10.3.121.13) - Bucket "saslbucket" rebalance appears to be swap rebalance
2012-08-29 01:06:21.275 ns_orchestrator:2:info:message(ns_1@10.3.121.13) - Rebalance exited with reason {{bulk_set_vbucket_state_failed,
[{'ns_1@10.3.121.20',
{'EXIT',
{{timeout,
{gen_server,call,
[timeout_diag_logger,
{diag,
{timeout,
{gen_server,call,[ns_config,get]}}}]}},
{gen_server,call,
[{'janitor_agent-saslbucket',
'ns_1@10.3.121.20'},
{if_rebalance,<0.12846.432>,
{update_vbucket_state,294,replica,
undefined,undefined}},
infinity]}}}}]},
[{janitor_agent,bulk_set_vbucket_state,4},
{ns_vbucket_mover,
update_replication_post_move,3},
{ns_vbucket_mover,handle_info,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
Look at log on node 20 around 2012-08-29 01:06:21 before rebalance failed, see memcached connection closed by force
[ns_server:info,2012-08-29T1:05:49.620,ns_1@10.3.121.20:ns_config_rep:ns_config_rep:do_pull:341]Pulling config from: 'ns_1@10.3.121.22'
[ns_server:debug,2012-08-29T1:05:50.500,ns_1@10.3.121.20:'capi_ddoc_replication_srv-default':cb_generic_replication_srv:handle_info:144]doing replicate_newnodes_docs
[ns_server:debug,2012-08-29T1:05:50.656,ns_1@10.3.121.20:ns_config_rep:ns_config_rep:handle_call:113]Got full synchronization request from 'ns_1@10.3.121.13'
[ns_server:debug,2012-08-29T1:05:50.660,ns_1@10.3.121.20:ns_config_log:ns_config_log:log_common:111]config change:
buckets ->
[{configs,[{"default",
[{map,[]},
{fastForwardMap,[]},
{uuid,<<"cdf4999f6920e2d4c3526e17f36c843f">>},
{sasl_password,[]},
{num_replicas,2},
{replica_index,true},
{ram_quota,3145728000},
{auth_type,sasl},
{autocompaction,false},
{type,membase},
{num_vbuckets,1024},
{servers,['ns_1@10.3.121.13','ns_1@10.3.121.14',
'ns_1@10.3.121.15','ns_1@10.3.121.16',
'ns_1@10.3.121.17','ns_1@10.3.121.20',
'ns_1@10.3.121.22','ns_1@10.3.121.23']}]},
{"saslbucket",
[{map,[{202,
['ns_1@10.3.121.15','ns_1@10.3.121.22','ns_1@10.3.121.16'],
['ns_1@10.3.121.15','ns_1@10.3.121.25',
'ns_1@10.3.121.16']}]},
{fastForwardMap,[]},
{uuid,<<"9f1fdb211536936b49c8a392e079ecb0">>},
{num_replicas,2},
{replica_index,true},
{ram_quota,3145728000},
{auth_type,sasl},
{sasl_password,"password"},
{autocompaction,false},
{type,membase},
{num_vbuckets,1024},
{servers,['ns_1@10.3.121.13','ns_1@10.3.121.14',
'ns_1@10.3.121.15','ns_1@10.3.121.16',
'ns_1@10.3.121.17','ns_1@10.3.121.24',
'ns_1@10.3.121.25','ns_1@10.3.121.26',
'ns_1@10.3.121.20','ns_1@10.3.121.22',
'ns_1@10.3.121.23']}]}]}]
[ns_server:debug,2012-08-29T1:05:50.660,ns_1@10.3.121.20:ns_config_rep:ns_config_rep:handle_call:119]Fully synchronized config in 458 us
[ns_server:info,2012-08-29T1:05:50.857,ns_1@10.3.121.20:ns_port_memcached:ns_port_server:log:169]memcached<0.506.0>: Wed Aug 29 08:05:50.552220 3: TAP (Producer) eq_tapq:replication_building_272_'ns_1@10.3.121.17' - disconnected, keep alive for 300 seconds
memcached<0.506.0>: Wed Aug 29 08:05:50.623620 3: TAP (Producer) eq_tapq:replication_building_272_'ns_1@10.3.121.13' - disconnected, keep alive for 300 seconds
[ns_server:debug,2012-08-29T1:05:50.902,ns_1@10.3.121.20:'capi_ddoc_replication_srv-default':cb_generic_replication_srv:handle_info:144]doing replicate_newnodes_docs
[ns_server:error,2012-08-29T1:05:51.136,ns_1@10.3.121.20:<0.16154.0>:ns_memcached:verify_report_long_call:274]call {stats,<<>>} took too long: 1276102 us
[ns_server:error,2012-08-29T1:05:51.136,ns_1@10.3.121.20:'ns_memcached-saslbucket':ns_memcached:handle_info:594]handle_info(ensure_bucket,..) took too long: 809713 us
[stats:warn,2012-08-29T1:05:51.387,ns_1@10.3.121.20:<0.16160.0>:stats_collector:latest_tick:201]Dropped 1 ticks
[ns_server:info,2012-08-29T1:05:52.779,ns_1@10.3.121.20:ns_port_memcached:ns_port_server:log:169]memcached<0.506.0>: Wed Aug 29 08:05:52.492646 3: TAP (Producer) eq_tapq:replication_building_272_'ns_1@10.3.121.24' - Connection is closed by force.
Link to diags of all nodes
https://s3.amazonaws.com/packages.couchbase/diag-logs/large_cluster_2_0/11nodes-1645-memchaced-close-connection-by-force-20120829.tgz
10.3.121.13
10.3.121.14
10.3.121.15
10.3.121.16
10.3.121.17
10.3.121.20
10.3.121.22
10.3.121.26
10.3.121.24
10.3.121.25
10.3.121.23
Create cluster 10 nodes
Create 2 buckets, default and saslbucket
Create 1 docs with 2 views per doc for each bucket
Load 7+ million items to default bucket and 1+ million items to saslbucket
Maintain load about 12 K ops on default bucket and 3 K ops on saslbucket
Continuously querying views on both buckets
Rebalance in and out node
Then do swap rebalance by add 3 nodes ( 24, 25 and 26) to cluster and remove 3 nodes (20, 22 and 23) out of cluster
After 3 minutes rebalance, rebalance failed with error
2012-08-29 01:03:09.578 ns_orchestrator:4:info:message(ns_1@10.3.121.13) - Starting rebalance, KeepNodes = ['ns_1@10.3.121.13','ns_1@10.3.121.14',
'ns_1@10.3.121.15','ns_1@10.3.121.16',
'ns_1@10.3.121.17','ns_1@10.3.121.24',
'ns_1@10.3.121.25','ns_1@10.3.121.26'], EjectNodes = ['ns_1@10.3.121.20',
'ns_1@10.3.121.22',
'ns_1@10.3.121.23']
2012-08-29 01:03:09.893 ns_storage_conf:0:info:message(ns_1@10.3.121.24) - Deleting old data files of bucket "saslbucket"
2012-08-29 01:03:09.893 ns_storage_conf:0:info:message(ns_1@10.3.121.24) - Deleting old data files of bucket "default"
2012-08-29 01:03:09.896 ns_storage_conf:0:info:message(ns_1@10.3.121.26) - Deleting old data files of bucket "saslbucket"
2012-08-29 01:03:09.898 ns_storage_conf:0:info:message(ns_1@10.3.121.26) - Deleting old data files of bucket "default"
2012-08-29 01:03:09.942 ns_storage_conf:0:info:message(ns_1@10.3.121.25) - Deleting old data files of bucket "saslbucket"
2012-08-29 01:03:09.942 ns_storage_conf:0:info:message(ns_1@10.3.121.25) - Deleting old data files of bucket "default"
2012-08-29 01:03:13.771 ns_rebalancer:0:info:message(ns_1@10.3.121.13) - Started rebalancing bucket saslbucket
2012-08-29 01:03:14.794 ns_memcached:1:info:message(ns_1@10.3.121.26) - Bucket "saslbucket" loaded on node 'ns_1@10.3.121.26' in 0 seconds.
2012-08-29 01:03:14.820 ns_memcached:1:info:message(ns_1@10.3.121.24) - Bucket "saslbucket" loaded on node 'ns_1@10.3.121.24' in 0 seconds.
2012-08-29 01:03:14.831 ns_memcached:1:info:message(ns_1@10.3.121.25) - Bucket "saslbucket" loaded on node 'ns_1@10.3.121.25' in 0 seconds.
2012-08-29 01:03:16.674 ns_vbucket_mover:0:info:message(ns_1@10.3.121.13) - Bucket "saslbucket" rebalance appears to be swap rebalance
2012-08-29 01:06:21.275 ns_orchestrator:2:info:message(ns_1@10.3.121.13) - Rebalance exited with reason {{bulk_set_vbucket_state_failed,
[{'ns_1@10.3.121.20',
{'EXIT',
{{timeout,
{gen_server,call,
[timeout_diag_logger,
{diag,
{timeout,
{gen_server,call,[ns_config,get]}}}]}},
{gen_server,call,
[{'janitor_agent-saslbucket',
'ns_1@10.3.121.20'},
{if_rebalance,<0.12846.432>,
{update_vbucket_state,294,replica,
undefined,undefined}},
infinity]}}}}]},
[{janitor_agent,bulk_set_vbucket_state,4},
{ns_vbucket_mover,
update_replication_post_move,3},
{ns_vbucket_mover,handle_info,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
Look at log on node 20 around 2012-08-29 01:06:21 before rebalance failed, see memcached connection closed by force
[ns_server:info,2012-08-29T1:05:49.620,ns_1@10.3.121.20:ns_config_rep:ns_config_rep:do_pull:341]Pulling config from: 'ns_1@10.3.121.22'
[ns_server:debug,2012-08-29T1:05:50.500,ns_1@10.3.121.20:'capi_ddoc_replication_srv-default':cb_generic_replication_srv:handle_info:144]doing replicate_newnodes_docs
[ns_server:debug,2012-08-29T1:05:50.656,ns_1@10.3.121.20:ns_config_rep:ns_config_rep:handle_call:113]Got full synchronization request from 'ns_1@10.3.121.13'
[ns_server:debug,2012-08-29T1:05:50.660,ns_1@10.3.121.20:ns_config_log:ns_config_log:log_common:111]config change:
buckets ->
[{configs,[{"default",
[{map,[]},
{fastForwardMap,[]},
{uuid,<<"cdf4999f6920e2d4c3526e17f36c843f">>},
{sasl_password,[]},
{num_replicas,2},
{replica_index,true},
{ram_quota,3145728000},
{auth_type,sasl},
{autocompaction,false},
{type,membase},
{num_vbuckets,1024},
{servers,['ns_1@10.3.121.13','ns_1@10.3.121.14',
'ns_1@10.3.121.15','ns_1@10.3.121.16',
'ns_1@10.3.121.17','ns_1@10.3.121.20',
'ns_1@10.3.121.22','ns_1@10.3.121.23']}]},
{"saslbucket",
[{map,[{202,
['ns_1@10.3.121.15','ns_1@10.3.121.22','ns_1@10.3.121.16'],
['ns_1@10.3.121.15','ns_1@10.3.121.25',
'ns_1@10.3.121.16']}]},
{fastForwardMap,[]},
{uuid,<<"9f1fdb211536936b49c8a392e079ecb0">>},
{num_replicas,2},
{replica_index,true},
{ram_quota,3145728000},
{auth_type,sasl},
{sasl_password,"password"},
{autocompaction,false},
{type,membase},
{num_vbuckets,1024},
{servers,['ns_1@10.3.121.13','ns_1@10.3.121.14',
'ns_1@10.3.121.15','ns_1@10.3.121.16',
'ns_1@10.3.121.17','ns_1@10.3.121.24',
'ns_1@10.3.121.25','ns_1@10.3.121.26',
'ns_1@10.3.121.20','ns_1@10.3.121.22',
'ns_1@10.3.121.23']}]}]}]
[ns_server:debug,2012-08-29T1:05:50.660,ns_1@10.3.121.20:ns_config_rep:ns_config_rep:handle_call:119]Fully synchronized config in 458 us
[ns_server:info,2012-08-29T1:05:50.857,ns_1@10.3.121.20:ns_port_memcached:ns_port_server:log:169]memcached<0.506.0>: Wed Aug 29 08:05:50.552220 3: TAP (Producer) eq_tapq:replication_building_272_'ns_1@10.3.121.17' - disconnected, keep alive for 300 seconds
memcached<0.506.0>: Wed Aug 29 08:05:50.623620 3: TAP (Producer) eq_tapq:replication_building_272_'ns_1@10.3.121.13' - disconnected, keep alive for 300 seconds
[ns_server:debug,2012-08-29T1:05:50.902,ns_1@10.3.121.20:'capi_ddoc_replication_srv-default':cb_generic_replication_srv:handle_info:144]doing replicate_newnodes_docs
[ns_server:error,2012-08-29T1:05:51.136,ns_1@10.3.121.20:<0.16154.0>:ns_memcached:verify_report_long_call:274]call {stats,<<>>} took too long: 1276102 us
[ns_server:error,2012-08-29T1:05:51.136,ns_1@10.3.121.20:'ns_memcached-saslbucket':ns_memcached:handle_info:594]handle_info(ensure_bucket,..) took too long: 809713 us
[stats:warn,2012-08-29T1:05:51.387,ns_1@10.3.121.20:<0.16160.0>:stats_collector:latest_tick:201]Dropped 1 ticks
[ns_server:info,2012-08-29T1:05:52.779,ns_1@10.3.121.20:ns_port_memcached:ns_port_server:log:169]memcached<0.506.0>: Wed Aug 29 08:05:52.492646 3: TAP (Producer) eq_tapq:replication_building_272_'ns_1@10.3.121.24' - Connection is closed by force.
Link to diags of all nodes
https://s3.amazonaws.com/packages.couchbase/diag-logs/large_cluster_2_0/11nodes-1645-memchaced-close-connection-by-force-20120829.tgz
Activity
- All
- Comments
- Work Log
- History
- Activity
- Gerrit Reviews
Thuan Nguyen
made changes -
| Field | Original Value | New Value |
|---|---|---|
| Description |
Install couchbase server 2.0.0-1645 on 11 nodes
10.3.121.13 10.3.121.14 10.3.121.15 10.3.121.16 10.3.121.17 10.3.121.20 10.3.121.22 10.3.121.26 10.3.121.24 10.3.121.25 10.3.121.23 Create cluster 10 nodes Create 2 buckets, default and saslbucket Create 1 docs with 2 views per doc for each bucket Load 7+ million items to default bucket and 1+ million items to saslbucket Maintain load about 12 K ops on default bucket and 3 K ops on saslbucket Continuously querying views on both buckets Rebalance in and out node Then do swap rebalance by add 3 nodes ( 24, 25 and 26) to cluster and remove 3 nodes (20, 22 and 23) out of cluster After 3 minutes rebalance, rebalance failed with error 2012-08-29 01:03:09.578 ns_orchestrator:4:info:message(ns_1@10.3.121.13) - Starting rebalance, KeepNodes = ['ns_1@10.3.121.13','ns_1@10.3.121.14', 'ns_1@10.3.121.15','ns_1@10.3.121.16', 'ns_1@10.3.121.17','ns_1@10.3.121.24', 'ns_1@10.3.121.25','ns_1@10.3.121.26'], EjectNodes = ['ns_1@10.3.121.20', 'ns_1@10.3.121.22', 'ns_1@10.3.121.23'] 2012-08-29 01:03:09.893 ns_storage_conf:0:info:message(ns_1@10.3.121.24) - Deleting old data files of bucket "saslbucket" 2012-08-29 01:03:09.893 ns_storage_conf:0:info:message(ns_1@10.3.121.24) - Deleting old data files of bucket "default" 2012-08-29 01:03:09.896 ns_storage_conf:0:info:message(ns_1@10.3.121.26) - Deleting old data files of bucket "saslbucket" 2012-08-29 01:03:09.898 ns_storage_conf:0:info:message(ns_1@10.3.121.26) - Deleting old data files of bucket "default" 2012-08-29 01:03:09.942 ns_storage_conf:0:info:message(ns_1@10.3.121.25) - Deleting old data files of bucket "saslbucket" 2012-08-29 01:03:09.942 ns_storage_conf:0:info:message(ns_1@10.3.121.25) - Deleting old data files of bucket "default" 2012-08-29 01:03:13.771 ns_rebalancer:0:info:message(ns_1@10.3.121.13) - Started rebalancing bucket saslbucket 2012-08-29 01:03:14.794 ns_memcached:1:info:message(ns_1@10.3.121.26) - Bucket "saslbucket" loaded on node 'ns_1@10.3.121.26' in 0 seconds. 2012-08-29 01:03:14.820 ns_memcached:1:info:message(ns_1@10.3.121.24) - Bucket "saslbucket" loaded on node 'ns_1@10.3.121.24' in 0 seconds. 2012-08-29 01:03:14.831 ns_memcached:1:info:message(ns_1@10.3.121.25) - Bucket "saslbucket" loaded on node 'ns_1@10.3.121.25' in 0 seconds. 2012-08-29 01:03:16.674 ns_vbucket_mover:0:info:message(ns_1@10.3.121.13) - Bucket "saslbucket" rebalance appears to be swap rebalance 2012-08-29 01:06:21.275 ns_orchestrator:2:info:message(ns_1@10.3.121.13) - Rebalance exited with reason {{bulk_set_vbucket_state_failed, [{'ns_1@10.3.121.20', {'EXIT', {{timeout, {gen_server,call, [timeout_diag_logger, {diag, {timeout, {gen_server,call,[ns_config,get]}}}]}}, {gen_server,call, [{'janitor_agent-saslbucket', 'ns_1@10.3.121.20'}, {if_rebalance,<0.12846.432>, {update_vbucket_state,294,replica, undefined,undefined}}, infinity]}}}}]}, [{janitor_agent,bulk_set_vbucket_state,4}, {ns_vbucket_mover, update_replication_post_move,3}, {ns_vbucket_mover,handle_info,2}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]} Look at log on node 20 around2012-08-29 01:06:21 before rebalance failed, see memcached connection closed by force [ns_server:debug,2012-08-29T1:05:50.902,ns_1@10.3.121.20:'capi_ddoc_replication_srv-default':cb_generic_replication_srv:handle_info:144]doing replicate_newnodes_docs [ns_server:error,2012-08-29T1:05:51.136,ns_1@10.3.121.20:<0.16154.0>:ns_memcached:verify_report_long_call:274]call {stats,<<>>} took too long: 1276102 us [ns_server:error,2012-08-29T1:05:51.136,ns_1@10.3.121.20:'ns_memcached-saslbucket':ns_memcached:handle_info:594]handle_info(ensure_bucket,..) took too long: 809713 us [stats:warn,2012-08-29T1:05:51.387,ns_1@10.3.121.20:<0.16160.0>:stats_collector:latest_tick:201]Dropped 1 ticks [ns_server:info,2012-08-29T1:05:52.779,ns_1@10.3.121.20:ns_port_memcached:ns_port_server:log:169]memcached<0.506.0>: Wed Aug 29 08:05:52.492646 3: TAP (Producer) eq_tapq:replication_building_272_'ns_1@10.3.121.24' - Connection is closed by force. Link to diags of all nodes https://s3.amazonaws.com/packages.couchbase/diag-logs/large_cluster_2_0/11nodes-1645-memchaced-close-connection-by-force-20120829.tgz |
Install couchbase server 2.0.0-1645 on 11 nodes
10.3.121.13 10.3.121.14 10.3.121.15 10.3.121.16 10.3.121.17 10.3.121.20 10.3.121.22 10.3.121.26 10.3.121.24 10.3.121.25 10.3.121.23 Create cluster 10 nodes Create 2 buckets, default and saslbucket Create 1 docs with 2 views per doc for each bucket Load 7+ million items to default bucket and 1+ million items to saslbucket Maintain load about 12 K ops on default bucket and 3 K ops on saslbucket Continuously querying views on both buckets Rebalance in and out node Then do swap rebalance by add 3 nodes ( 24, 25 and 26) to cluster and remove 3 nodes (20, 22 and 23) out of cluster After 3 minutes rebalance, rebalance failed with error 2012-08-29 01:03:09.578 ns_orchestrator:4:info:message(ns_1@10.3.121.13) - Starting rebalance, KeepNodes = ['ns_1@10.3.121.13','ns_1@10.3.121.14', 'ns_1@10.3.121.15','ns_1@10.3.121.16', 'ns_1@10.3.121.17','ns_1@10.3.121.24', 'ns_1@10.3.121.25','ns_1@10.3.121.26'], EjectNodes = ['ns_1@10.3.121.20', 'ns_1@10.3.121.22', 'ns_1@10.3.121.23'] 2012-08-29 01:03:09.893 ns_storage_conf:0:info:message(ns_1@10.3.121.24) - Deleting old data files of bucket "saslbucket" 2012-08-29 01:03:09.893 ns_storage_conf:0:info:message(ns_1@10.3.121.24) - Deleting old data files of bucket "default" 2012-08-29 01:03:09.896 ns_storage_conf:0:info:message(ns_1@10.3.121.26) - Deleting old data files of bucket "saslbucket" 2012-08-29 01:03:09.898 ns_storage_conf:0:info:message(ns_1@10.3.121.26) - Deleting old data files of bucket "default" 2012-08-29 01:03:09.942 ns_storage_conf:0:info:message(ns_1@10.3.121.25) - Deleting old data files of bucket "saslbucket" 2012-08-29 01:03:09.942 ns_storage_conf:0:info:message(ns_1@10.3.121.25) - Deleting old data files of bucket "default" 2012-08-29 01:03:13.771 ns_rebalancer:0:info:message(ns_1@10.3.121.13) - Started rebalancing bucket saslbucket 2012-08-29 01:03:14.794 ns_memcached:1:info:message(ns_1@10.3.121.26) - Bucket "saslbucket" loaded on node 'ns_1@10.3.121.26' in 0 seconds. 2012-08-29 01:03:14.820 ns_memcached:1:info:message(ns_1@10.3.121.24) - Bucket "saslbucket" loaded on node 'ns_1@10.3.121.24' in 0 seconds. 2012-08-29 01:03:14.831 ns_memcached:1:info:message(ns_1@10.3.121.25) - Bucket "saslbucket" loaded on node 'ns_1@10.3.121.25' in 0 seconds. 2012-08-29 01:03:16.674 ns_vbucket_mover:0:info:message(ns_1@10.3.121.13) - Bucket "saslbucket" rebalance appears to be swap rebalance 2012-08-29 01:06:21.275 ns_orchestrator:2:info:message(ns_1@10.3.121.13) - Rebalance exited with reason {{bulk_set_vbucket_state_failed, [{'ns_1@10.3.121.20', {'EXIT', {{timeout, {gen_server,call, [timeout_diag_logger, {diag, {timeout, {gen_server,call,[ns_config,get]}}}]}}, {gen_server,call, [{'janitor_agent-saslbucket', 'ns_1@10.3.121.20'}, {if_rebalance,<0.12846.432>, {update_vbucket_state,294,replica, undefined,undefined}}, infinity]}}}}]}, [{janitor_agent,bulk_set_vbucket_state,4}, {ns_vbucket_mover, update_replication_post_move,3}, {ns_vbucket_mover,handle_info,2}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]} Look at log on node 20 around 2012-08-29 01:06:21 before rebalance failed, see memcached connection closed by force [ns_server:info,2012-08-29T1:05:49.620,ns_1@10.3.121.20:ns_config_rep:ns_config_rep:do_pull:341]Pulling config from: 'ns_1@10.3.121.22' [ns_server:debug,2012-08-29T1:05:50.500,ns_1@10.3.121.20:'capi_ddoc_replication_srv-default':cb_generic_replication_srv:handle_info:144]doing replicate_newnodes_docs [ns_server:debug,2012-08-29T1:05:50.656,ns_1@10.3.121.20:ns_config_rep:ns_config_rep:handle_call:113]Got full synchronization request from 'ns_1@10.3.121.13' [ns_server:debug,2012-08-29T1:05:50.660,ns_1@10.3.121.20:ns_config_log:ns_config_log:log_common:111]config change: buckets -> [{configs,[{"default", [{map,[]}, {fastForwardMap,[]}, {uuid,<<"cdf4999f6920e2d4c3526e17f36c843f">>}, {sasl_password,[]}, {num_replicas,2}, {replica_index,true}, {ram_quota,3145728000}, {auth_type,sasl}, {autocompaction,false}, {type,membase}, {num_vbuckets,1024}, {servers,['ns_1@10.3.121.13','ns_1@10.3.121.14', 'ns_1@10.3.121.15','ns_1@10.3.121.16', 'ns_1@10.3.121.17','ns_1@10.3.121.20', 'ns_1@10.3.121.22','ns_1@10.3.121.23']}]}, {"saslbucket", [{map,[{202, ['ns_1@10.3.121.15','ns_1@10.3.121.22','ns_1@10.3.121.16'], ['ns_1@10.3.121.15','ns_1@10.3.121.25', 'ns_1@10.3.121.16']}]}, {fastForwardMap,[]}, {uuid,<<"9f1fdb211536936b49c8a392e079ecb0">>}, {num_replicas,2}, {replica_index,true}, {ram_quota,3145728000}, {auth_type,sasl}, {sasl_password,"password"}, {autocompaction,false}, {type,membase}, {num_vbuckets,1024}, {servers,['ns_1@10.3.121.13','ns_1@10.3.121.14', 'ns_1@10.3.121.15','ns_1@10.3.121.16', 'ns_1@10.3.121.17','ns_1@10.3.121.24', 'ns_1@10.3.121.25','ns_1@10.3.121.26', 'ns_1@10.3.121.20','ns_1@10.3.121.22', 'ns_1@10.3.121.23']}]}]}] [ns_server:debug,2012-08-29T1:05:50.660,ns_1@10.3.121.20:ns_config_rep:ns_config_rep:handle_call:119]Fully synchronized config in 458 us [ns_server:info,2012-08-29T1:05:50.857,ns_1@10.3.121.20:ns_port_memcached:ns_port_server:log:169]memcached<0.506.0>: Wed Aug 29 08:05:50.552220 3: TAP (Producer) eq_tapq:replication_building_272_'ns_1@10.3.121.17' - disconnected, keep alive for 300 seconds memcached<0.506.0>: Wed Aug 29 08:05:50.623620 3: TAP (Producer) eq_tapq:replication_building_272_'ns_1@10.3.121.13' - disconnected, keep alive for 300 seconds [ns_server:debug,2012-08-29T1:05:50.902,ns_1@10.3.121.20:'capi_ddoc_replication_srv-default':cb_generic_replication_srv:handle_info:144]doing replicate_newnodes_docs [ns_server:error,2012-08-29T1:05:51.136,ns_1@10.3.121.20:<0.16154.0>:ns_memcached:verify_report_long_call:274]call {stats,<<>>} took too long: 1276102 us [ns_server:error,2012-08-29T1:05:51.136,ns_1@10.3.121.20:'ns_memcached-saslbucket':ns_memcached:handle_info:594]handle_info(ensure_bucket,..) took too long: 809713 us [stats:warn,2012-08-29T1:05:51.387,ns_1@10.3.121.20:<0.16160.0>:stats_collector:latest_tick:201]Dropped 1 ticks [ns_server:info,2012-08-29T1:05:52.779,ns_1@10.3.121.20:ns_port_memcached:ns_port_server:log:169]memcached<0.506.0>: Wed Aug 29 08:05:52.492646 3: TAP (Producer) eq_tapq:replication_building_272_'ns_1@10.3.121.24' - Connection is closed by force. Link to diags of all nodes https://s3.amazonaws.com/packages.couchbase/diag-logs/large_cluster_2_0/11nodes-1645-memchaced-close-connection-by-force-20120829.tgz |
Thuan Nguyen
made changes -
| Summary | memcached closes the connection on source with error Rebalance exited with reason {{bulk_set_vbucket_state_failed, [{'ns_1@10.3.121.20', {'EXIT', {{timeout, {gen_server,call, [timeout_diag_logger, {diag, {timeout, | rebalance failed due to memcached connection is closed by force |
Chiyoung Seo
made changes -
| Sprint Status | Current Sprint |
Thuan Nguyen
made changes -
| Attachment | memcached-log-1663-20120905.tgz [ 14770 ] |
Chiyoung Seo
made changes -
| Summary | rebalance failed due to memcached connection is closed by force | Rebalance failed due to the timeout in bulk_set_vbucket_state operation with continuous view queries |
Farshid Ghods
made changes -
| Labels | 2.0-beta-release-notes |
Farshid Ghods
made changes -
| Fix Version/s | 2.0 [ 10114 ] | |
| Fix Version/s | 2.0-beta [ 10113 ] |
Peter Wansch
made changes -
| Assignee | Chiyoung Seo [ chiyoung ] | Aleksey Kondratenko [ alkondratenko ] |
Karan Kumar
made changes -
| Labels | 2.0-beta-release-notes | 2.0-beta-release-notes system-test |
Farshid Ghods
made changes -
| Assignee | Aleksey Kondratenko [ alkondratenko ] | Thuan Nguyen [ thuan ] |
Chiyoung Seo
made changes -
| Status | Open [ 1 ] | Resolved [ 5 ] |
| Resolution | Duplicate [ 3 ] |
Chiyoung Seo
made changes -
| Sprint Status | Current Sprint |
Farshid Ghods
made changes -
| Status | Resolved [ 5 ] | Closed [ 6 ] |