Details
Description
Restarts happen on 1 or 2 nodes every time I run tests, usually with the same error.
No problems with loading data and initial indexing. Why does it happen?
[ns_server:info,2012-11-06T12:43:31.635,ns_1@10.2.3.31:mb_master<0.18558.13>:mb_master:terminate:288]Synchronously shutting down child mb_master_sup
[error_logger:error,2012-11-06T12:43:32.745,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor: {local,mb_master_sup}
Context: shutdown_error
Reason: killed
Offender: [{pid,<0.19815.13>},
{name,ns_orchestrator},
{mfargs,{ns_orchestrator,start_link,[]}},
{restart_type,permanent},
{shutdown,20},
{child_type,worker}]
[stats:warn,2012-11-06T12:43:32.651,ns_1@10.2.3.31:system_stats_collector<0.478.0>:system_stats_collector:handle_info:133]lost 7 ticks
[ns_server:debug,2012-11-06T12:43:33.495,ns_1@10.2.3.31:<0.18559.13>:ns_pubsub:do_subscribe_link:132]Parent process of subscription {ns_config_events,<0.18558.13>} exited with reason {timeout,
{gen_server,
call,
[ns_node_disco,
nodes_wanted]}}
[error_logger:error,2012-11-06T12:43:34.073,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** State machine mb_master terminating
** Last message in was send_heartbeat
** When State == master
** Data == {state,<0.19814.13>,'ns_1@10.2.3.31',
['ns_1@10.2.3.31','ns_1@10.2.3.33','ns_1@10.2.3.34',
'ns_1@10.2.3.35'],
{1352,234605,120106}}
** Reason for termination =
** {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
[ns_server:debug,2012-11-06T12:43:35.166,ns_1@10.2.3.31:ns_server_sup<0.385.0>:mb_master:check_master_takeover_needed:144]Sending master node question to the following nodes: ['ns_1@10.2.3.35',
'ns_1@10.2.3.34',
'ns_1@10.2.3.33']
[error_logger:error,2012-11-06T12:43:35.276,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: mb_master:init/1
pid: <0.18558.13>
registered_name: mb_master
exception exit: {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
in function gen_fsm:terminate/7
ancestors: [ns_server_sup,ns_server_cluster_sup,<0.66.0>]
messages: [send_heartbeat,send_heartbeat,send_heartbeat,send_heartbeat,
{#Ref<0.0.372.79904>,
['ns_1@10.2.3.31','ns_1@10.2.3.33','ns_1@10.2.3.34',
'ns_1@10.2.3.35']}]
links: [<0.385.0>,<0.18559.13>,<0.63.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 377
stack_size: 24
reductions: 147300
neighbours:
[error_logger:error,2012-11-06T12:43:35.307,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor: {local,ns_server_sup}
Context: child_terminated
Reason: {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
Offender: [{pid,<0.18558.13>},
{name,mb_master},
{mfargs,{mb_master,start_link,[]}},
{restart_type,permanent},
{shutdown,infinity},
{child_type,supervisor}]
[ns_server:error,2012-11-06T12:43:35.323,ns_1@10.2.3.31:<0.788.0>:ns_memcached:verify_report_long_call:297]call {stats,<<>>} took too long: 10203000 us
[couchdb:error,2012-11-06T12:43:41.588,ns_1@10.2.3.31:<0.24345.2>:couch_log:error:42]Uncaught error in HTTP request: {exit,
{timeout,{gen_server,call,[ns_config,get]}}}
Stacktrace: [{diag_handler,diagnosing_timeouts,1},
{menelaus_auth,check_auth,1},
{menelaus_auth,bucket_auth_fun,1},
{menelaus_auth,is_bucket_accessible,2},
{capi_frontend,do_db_req,2},
{couch_httpd,handle_request,6},
{mochiweb_http,headers,5},
{proc_lib,init_p_do_apply,3}]
[error_logger:error,2012-11-06T12:43:41.604,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server disksup terminating
** Last message in was timeout
** When Server state == [{data,[{"OS",{win32,nt}},
{"Timeout",60000},
{"Threshold",80},
{"DiskData",
[{"C:\\",52324348,51},
{"E:\\",268432380,14}]}]}]
** Reason for termination ==
** {timeout,{gen_server,call,[os_mon_sysinfo,get_disk_info]}}
No problems with loading data and initial indexing. Why does it happen?
[ns_server:info,2012-11-06T12:43:31.635,ns_1@10.2.3.31:mb_master<0.18558.13>:mb_master:terminate:288]Synchronously shutting down child mb_master_sup
[error_logger:error,2012-11-06T12:43:32.745,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor: {local,mb_master_sup}
Context: shutdown_error
Reason: killed
Offender: [{pid,<0.19815.13>},
{name,ns_orchestrator},
{mfargs,{ns_orchestrator,start_link,[]}},
{restart_type,permanent},
{shutdown,20},
{child_type,worker}]
[stats:warn,2012-11-06T12:43:32.651,ns_1@10.2.3.31:system_stats_collector<0.478.0>:system_stats_collector:handle_info:133]lost 7 ticks
[ns_server:debug,2012-11-06T12:43:33.495,ns_1@10.2.3.31:<0.18559.13>:ns_pubsub:do_subscribe_link:132]Parent process of subscription {ns_config_events,<0.18558.13>} exited with reason {timeout,
{gen_server,
call,
[ns_node_disco,
nodes_wanted]}}
[error_logger:error,2012-11-06T12:43:34.073,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** State machine mb_master terminating
** Last message in was send_heartbeat
** When State == master
** Data == {state,<0.19814.13>,'ns_1@10.2.3.31',
['ns_1@10.2.3.31','ns_1@10.2.3.33','ns_1@10.2.3.34',
'ns_1@10.2.3.35'],
{1352,234605,120106}}
** Reason for termination =
** {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
[ns_server:debug,2012-11-06T12:43:35.166,ns_1@10.2.3.31:ns_server_sup<0.385.0>:mb_master:check_master_takeover_needed:144]Sending master node question to the following nodes: ['ns_1@10.2.3.35',
'ns_1@10.2.3.34',
'ns_1@10.2.3.33']
[error_logger:error,2012-11-06T12:43:35.276,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: mb_master:init/1
pid: <0.18558.13>
registered_name: mb_master
exception exit: {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
in function gen_fsm:terminate/7
ancestors: [ns_server_sup,ns_server_cluster_sup,<0.66.0>]
messages: [send_heartbeat,send_heartbeat,send_heartbeat,send_heartbeat,
{#Ref<0.0.372.79904>,
['ns_1@10.2.3.31','ns_1@10.2.3.33','ns_1@10.2.3.34',
'ns_1@10.2.3.35']}]
links: [<0.385.0>,<0.18559.13>,<0.63.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 377
stack_size: 24
reductions: 147300
neighbours:
[error_logger:error,2012-11-06T12:43:35.307,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor: {local,ns_server_sup}
Context: child_terminated
Reason: {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
Offender: [{pid,<0.18558.13>},
{name,mb_master},
{mfargs,{mb_master,start_link,[]}},
{restart_type,permanent},
{shutdown,infinity},
{child_type,supervisor}]
[ns_server:error,2012-11-06T12:43:35.323,ns_1@10.2.3.31:<0.788.0>:ns_memcached:verify_report_long_call:297]call {stats,<<>>} took too long: 10203000 us
[couchdb:error,2012-11-06T12:43:41.588,ns_1@10.2.3.31:<0.24345.2>:couch_log:error:42]Uncaught error in HTTP request: {exit,
{timeout,{gen_server,call,[ns_config,get]}}}
Stacktrace: [{diag_handler,diagnosing_timeouts,1},
{menelaus_auth,check_auth,1},
{menelaus_auth,bucket_auth_fun,1},
{menelaus_auth,is_bucket_accessible,2},
{capi_frontend,do_db_req,2},
{couch_httpd,handle_request,6},
{mochiweb_http,headers,5},
{proc_lib,init_p_do_apply,3}]
[error_logger:error,2012-11-06T12:43:41.604,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server disksup terminating
** Last message in was timeout
** When Server state == [{data,[{"OS",{win32,nt}},
{"Timeout",60000},
{"Threshold",80},
{"DiskData",
[{"C:\\",52324348,51},
{"E:\\",268432380,14}]}]}]
** Reason for termination ==
** {timeout,{gen_server,call,[os_mon_sysinfo,get_disk_info]}}
MB-6595. It remains to be seen if this particular case is caused by lack of async threads or something else