Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Major
-
Resolution: Duplicate
-
Affects Version/s: 2.0.1
-
Fix Version/s: 2.0.1
-
Component/s: couchbase-bucket
-
Security Level: Public
-
Labels:
-
Environment:windows R2 2008 64bit
Description
Environment:
* 9 windows 2008 R2 64bit.
* Each server has 4 CPU, 8GB RAM and SSD disk
* Cluster has 2 buckets, default and sasl bucket with consistent view enable.
* Load 26 million items to default bucket and 16 million items to sasl bucket. Each key has size from 128 to 512 bytes
* Each bucket has one doc and 2 views for each doc.
- Rebalance out 2 nodes 10.3.121.173 and 10.3.121.243
Starting rebalance, KeepNodes = ['ns_1@10.3.3.181','ns_1@10.3.121.47',
'ns_1@10.3.3.214','ns_1@10.3.3.182',
'ns_1@10.3.3.180','ns_1@10.3.121.171',
'ns_1@10.3.121.169'], EjectNodes = ['ns_1@10.3.121.173',
'ns_1@10.3.121.243'] ns_orchestrator004 ns_1@10.3.121.169 23:26:03 - Tue Jan 22, 2013
- Rebalance failed due to buckets were shutting down on orchestrator node.
ns_server:debug,2013-01-23T8:29:27.672,ns_1@10.3.121.169:ns_config_log<0.803.0>:ns_config_log:log_common:111]config change:
rebalance_status ->
{none,<<"Rebalance stopped by janitor.">>}
[user:info,2013-01-23T8:29:26.219,ns_1@10.3.121.169:ns_memcached-default<0.968.1>:ns_memcached:terminate:661]Shutting down bucket "default" on 'ns_1@10.3.121.169' for server shutdown
[ns_server:error,2013-01-23T8:29:26.219,ns_1@10.3.121.169:timeout_diag_logger<0.699.0>:timeout_diag_logger:handle_call:104]
{<0.12009.70>,
[{registered_name,[]},
{status,waiting},
{initial_call,{proc_lib,init_p,5}},
{backtrace,[<<"Program counter: 0x04e7e1c8 (couch_file:reader_loop/3 + 116)">>,
<<"CP: 0x00000000 (invalid)">>,<<"arity = 0">>,<<>>,
<<"0x126e4ce4 Return addr 0x017a2da8 (proc_lib:init_p_do_apply/3 + 28)">>,
<<"y(0) 10">>,<<"y(1) \"c:/data/sasl/109.couch.14\"">>,
<<"y(2) []">>,<<>>,
<<"0x126e4cf4 Return addr 0x00b409b4 (<terminate process normally>)">>,
<<"y(0) Catch 0x017a2db8 (proc_lib:init_p_do_apply/3 + 44)">>,
<<>>]},
{error_handler,error_handler},
{garbage_collection,[{min_bin_vheap_size,46368},
{min_heap_size,233},
{fullsweep_after,512},
{minor_gcs,403}]},
{heap_size,377},
{total_heap_size,754},
{links,[<0.12008.70>]},
{memory,3496},
{message_queue_len,0},
{reductions,216588},
{trap_exit,true}]}
[ns_server:debug,2013-01-23T8:29:27.313,ns_1@10.3.121.169:<0.835.0>:ns_pubsub:do_subscribe_link:132]Parent process of subscription {buckets_events,<0.833.0>} exited with reason {shutdown,
{gen_server,
call,
['ns_vbm_new_sup-sasl',
which_children,
infinity]}}
[ns_server:debug,2013-01-23T8:29:27.313,ns_1@10.3.121.169:ns_config_log<0.803.0>:ns_config_log:log_common:111]config change:
rebalancer_pid ->
undefined
[ns_server:debug,2013-01-23T8:29:27.329,ns_1@10.3.121.169:capi_set_view_manager-sasl<0.8923.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
[user:info,2013-01-23T8:29:27.329,ns_1@10.3.121.169:ns_memcached-sasl<0.8955.0>:ns_memcached:terminate:661]Shutting down bucket "sasl" on 'ns_1@10.3.121.169' for server shutdown
[ns_server:debug,2013-01-23T8:29:27.344,ns_1@10.3.121.169:ns_config_log<0.803.0>:ns_config_log:log_common:111]config change:
auto_failover_cfg ->
[{enabled,false},{timeout,30},{max_nodes,1},{count,0}]
[ns_server:debug,2013-01-23T8:29:27.360,ns_1@10.3.121.169:ns_config_rep<0.31635.76>:ns_config_rep:do_push_keys:317]Replicating some config keys ([auto_failover_cfg,autocompaction,buckets,
cluster_compat_version,counters,
dynamic_config_version]..)
[ns_server:debug,2013-01-23T8:29:27.360,ns_1@10.3.121.169:capi_set_view_manager-sasl<0.8923.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
[ns_server:error,2013-01-23T8:29:27.360,ns_1@10.3.121.169:timeout_diag_logger<0.699.0>:timeout_diag_logger:handle_call:104]
{<0.10831.67>,
- Memcached logs at time around rebalance failed
Wed Jan 23 08:29:27.208484 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_18 - disconnected
Wed Jan 23 08:29:27.286609 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_20 - disconnected
Wed Jan 23 08:29:28.145984 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_17"
Wed Jan 23 08:29:28.161609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_18"
Wed Jan 23 08:29:28.161609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_19"
Wed Jan 23 08:29:28.161609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_20"
Wed Jan 23 08:29:28.177234 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_21"
Wed Jan 23 08:29:28.177234 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_22"
Wed Jan 23 08:29:28.192859 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_23"
Wed Jan 23 08:29:28.208484 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_24"
Wed Jan 23 08:29:29.005359 Pacific Standard Time 3: Shutting down tap connections!
Wed Jan 23 08:29:29.005359 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.171"
Wed Jan 23 08:29:29.083484 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.182"
Wed Jan 23 08:29:29.083484 Pacific Standard Time 3: Failed to notify thread: Unknown error
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.47"
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.171 - Clear the tap queues by force
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.214"
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Failed to notify thread: Unknown error
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.180"
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.182 - Clear the tap queues by force
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.181"
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: Failed to notify thread: Unknown error
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.47 - Clear the tap queues by force
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.214 - Clear the tap queues by force
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: Failed to notify thread: Unknown error
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.180 - Clear the tap queues by force
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.181 - Clear the tap queues by force
Wed Jan 23 08:29:42.130359 Pacific Standard Time 3: Had to wait 12 s for shutdown
Wed Jan 23 08:30:01.442859 Pacific Standard Time 3: Shutting down tap connections!
Wed Jan 23 08:30:01.442859 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.47"
Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.171"
Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.47 - Clear the tap queues by force
Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.181"
Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.171 - Clear the tap queues by force
Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.214"
Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.181 - Clear the tap queues by force
Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.180"
Wed Jan 23 08:30:01.536609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.182"
Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.214 - Clear the tap queues by force
Wed Jan 23 08:30:01.536609 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.180 - Clear the tap queues by force
Wed Jan 23 08:30:01.536609 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.182 - Clear the tap queues by force
Wed Jan 23 08:30:16.536609 Pacific Standard Time 3: Had to wait 15 s for shutdown
Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.1-140-rel.setup.exe.manifest.xml
* 9 windows 2008 R2 64bit.
* Each server has 4 CPU, 8GB RAM and SSD disk
* Cluster has 2 buckets, default and sasl bucket with consistent view enable.
* Load 26 million items to default bucket and 16 million items to sasl bucket. Each key has size from 128 to 512 bytes
* Each bucket has one doc and 2 views for each doc.
- Rebalance out 2 nodes 10.3.121.173 and 10.3.121.243
Starting rebalance, KeepNodes = ['ns_1@10.3.3.181','ns_1@10.3.121.47',
'ns_1@10.3.3.214','ns_1@10.3.3.182',
'ns_1@10.3.3.180','ns_1@10.3.121.171',
'ns_1@10.3.121.169'], EjectNodes = ['ns_1@10.3.121.173',
'ns_1@10.3.121.243'] ns_orchestrator004 ns_1@10.3.121.169 23:26:03 - Tue Jan 22, 2013
- Rebalance failed due to buckets were shutting down on orchestrator node.
ns_server:debug,2013-01-23T8:29:27.672,ns_1@10.3.121.169:ns_config_log<0.803.0>:ns_config_log:log_common:111]config change:
rebalance_status ->
{none,<<"Rebalance stopped by janitor.">>}
[user:info,2013-01-23T8:29:26.219,ns_1@10.3.121.169:ns_memcached-default<0.968.1>:ns_memcached:terminate:661]Shutting down bucket "default" on 'ns_1@10.3.121.169' for server shutdown
[ns_server:error,2013-01-23T8:29:26.219,ns_1@10.3.121.169:timeout_diag_logger<0.699.0>:timeout_diag_logger:handle_call:104]
{<0.12009.70>,
[{registered_name,[]},
{status,waiting},
{initial_call,{proc_lib,init_p,5}},
{backtrace,[<<"Program counter: 0x04e7e1c8 (couch_file:reader_loop/3 + 116)">>,
<<"CP: 0x00000000 (invalid)">>,<<"arity = 0">>,<<>>,
<<"0x126e4ce4 Return addr 0x017a2da8 (proc_lib:init_p_do_apply/3 + 28)">>,
<<"y(0) 10">>,<<"y(1) \"c:/data/sasl/109.couch.14\"">>,
<<"y(2) []">>,<<>>,
<<"0x126e4cf4 Return addr 0x00b409b4 (<terminate process normally>)">>,
<<"y(0) Catch 0x017a2db8 (proc_lib:init_p_do_apply/3 + 44)">>,
<<>>]},
{error_handler,error_handler},
{garbage_collection,[{min_bin_vheap_size,46368},
{min_heap_size,233},
{fullsweep_after,512},
{minor_gcs,403}]},
{heap_size,377},
{total_heap_size,754},
{links,[<0.12008.70>]},
{memory,3496},
{message_queue_len,0},
{reductions,216588},
{trap_exit,true}]}
[ns_server:debug,2013-01-23T8:29:27.313,ns_1@10.3.121.169:<0.835.0>:ns_pubsub:do_subscribe_link:132]Parent process of subscription {buckets_events,<0.833.0>} exited with reason {shutdown,
{gen_server,
call,
['ns_vbm_new_sup-sasl',
which_children,
infinity]}}
[ns_server:debug,2013-01-23T8:29:27.313,ns_1@10.3.121.169:ns_config_log<0.803.0>:ns_config_log:log_common:111]config change:
rebalancer_pid ->
undefined
[ns_server:debug,2013-01-23T8:29:27.329,ns_1@10.3.121.169:capi_set_view_manager-sasl<0.8923.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
[user:info,2013-01-23T8:29:27.329,ns_1@10.3.121.169:ns_memcached-sasl<0.8955.0>:ns_memcached:terminate:661]Shutting down bucket "sasl" on 'ns_1@10.3.121.169' for server shutdown
[ns_server:debug,2013-01-23T8:29:27.344,ns_1@10.3.121.169:ns_config_log<0.803.0>:ns_config_log:log_common:111]config change:
auto_failover_cfg ->
[{enabled,false},{timeout,30},{max_nodes,1},{count,0}]
[ns_server:debug,2013-01-23T8:29:27.360,ns_1@10.3.121.169:ns_config_rep<0.31635.76>:ns_config_rep:do_push_keys:317]Replicating some config keys ([auto_failover_cfg,autocompaction,buckets,
cluster_compat_version,counters,
dynamic_config_version]..)
[ns_server:debug,2013-01-23T8:29:27.360,ns_1@10.3.121.169:capi_set_view_manager-sasl<0.8923.0>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
[ns_server:error,2013-01-23T8:29:27.360,ns_1@10.3.121.169:timeout_diag_logger<0.699.0>:timeout_diag_logger:handle_call:104]
{<0.10831.67>,
- Memcached logs at time around rebalance failed
Wed Jan 23 08:29:27.208484 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_18 - disconnected
Wed Jan 23 08:29:27.286609 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_20 - disconnected
Wed Jan 23 08:29:28.145984 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_17"
Wed Jan 23 08:29:28.161609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_18"
Wed Jan 23 08:29:28.161609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_19"
Wed Jan 23 08:29:28.161609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_20"
Wed Jan 23 08:29:28.177234 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_21"
Wed Jan 23 08:29:28.177234 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_22"
Wed Jan 23 08:29:28.192859 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_23"
Wed Jan 23 08:29:28.208484 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:anon_24"
Wed Jan 23 08:29:29.005359 Pacific Standard Time 3: Shutting down tap connections!
Wed Jan 23 08:29:29.005359 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.171"
Wed Jan 23 08:29:29.083484 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.182"
Wed Jan 23 08:29:29.083484 Pacific Standard Time 3: Failed to notify thread: Unknown error
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.47"
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.171 - Clear the tap queues by force
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.214"
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Failed to notify thread: Unknown error
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.180"
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.182 - Clear the tap queues by force
Wed Jan 23 08:29:29.114734 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.181"
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: Failed to notify thread: Unknown error
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.47 - Clear the tap queues by force
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.214 - Clear the tap queues by force
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: Failed to notify thread: Unknown error
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.180 - Clear the tap queues by force
Wed Jan 23 08:29:29.130359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.181 - Clear the tap queues by force
Wed Jan 23 08:29:42.130359 Pacific Standard Time 3: Had to wait 12 s for shutdown
Wed Jan 23 08:30:01.442859 Pacific Standard Time 3: Shutting down tap connections!
Wed Jan 23 08:30:01.442859 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.47"
Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.171"
Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.47 - Clear the tap queues by force
Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.181"
Wed Jan 23 08:30:01.505359 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.171 - Clear the tap queues by force
Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.214"
Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.181 - Clear the tap queues by force
Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.180"
Wed Jan 23 08:30:01.536609 Pacific Standard Time 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.182"
Wed Jan 23 08:30:01.520984 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.214 - Clear the tap queues by force
Wed Jan 23 08:30:01.536609 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.180 - Clear the tap queues by force
Wed Jan 23 08:30:01.536609 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.182 - Clear the tap queues by force
Wed Jan 23 08:30:16.536609 Pacific Standard Time 3: Had to wait 15 s for shutdown
Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.1-140-rel.setup.exe.manifest.xml
Issue Links
- depends on
-
MB-7658
[Windows] severe timeouts in windows cluster with 2 buckets and 1 ddoc per bucket causes rebalance and other failures
-
Activity
- All
- Comments
- Work Log
- History
- Activity
- Gerrit Reviews