Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Couchbase | Couchbase Server 2.0

Couchbase node crash

1 reply [Last post]
  • Login or register to post comments
Tue, 06/26/2012 - 05:46
ARabus
Offline
Joined: 09/30/2011
Groups: None

Hi all,

today our Cluster lost a node, why is not clear, see parts of the logs below. The node was fail overed and the cluster did still run. (As expected... ;)

Rejoining the cluster failed with some memcache errors on the faulty node.

What did help was removing the node from the cluster, purging the couchbase installaltion, re-setup the node and add the new blank node to the cluster.

Our Queston now is:
What went wrong?

Anybody got some clues?

Thanks,

ARabus

Some excerpts from the log:

First we got some heartbeat errors in the log:

[ns_1@192.168.70.204:system_stats_collector:system_stats_collector:handle_info:130] lost 1 ticks

The node did come back online however but failed to join the cluster.
Somethign with memcache errors

[error_logger:error] [2012-06-26 0:14:36] [ns_1@192.168.70.204:error_logger:ale_error_logger_handler:log_msg:76] ** State machine mb_master terminating
** Last message in was send_heartbeat
** When State == master
** When State == master
**      Data  == {state,<0.15647.1766>,'ns_1@192.168.70.204',
	['ns_1@192.168.70.204','ns_1@192.168.70.227',
	'ns_1@192.168.70.228'],
	{1340,662463,384085}}
** Reason for termination =
** {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
	[ns_server:error] [2012-06-26 0:14:38] [ns_1@192.168.70.204:'ns_memcached-assets':ns_memcached:handle_call:139] call {stats,
	<<>>} took too long: 10535791 us
[ns_doctor:error] [2012-06-26 0:14:38] [ns_1@192.168.70.204:<0.535.0>:ns_doctor:get_nodes:153] Error attempting to get nodes: {exit,
	{noproc,
	{gen_server,
	call,
	[ns_doctor,
	get_nodes]}}}
[menelaus:warn] [2012-06-26 0:14:47] [ns_1@192.168.70.204:<0.534.0>:menelaus_web:loop:357] Server error during processing: ["web request failed",
	{path,
	"/pools/default/bucketsStreaming/itunes"},
	{type,
	exit},
	{what,
	{timeout,
	{gen_server,
	call,
	[ns_cookie_manager,
	cookie_get]}}},
	{trace,
	[{gen_server,
	call,
	2},
	{menelaus_web,
	build_nodes_info_fun,
	3},
	{menelaus_web_buckets,
	build_bucket_node_infos,
	5},
	{menelaus_web_buckets,
	build_bucket_info,
	5},
	{menelaus_web,
	streaming_inner,
	3},
	{menelaus_web,
	handle_streaming,
	4},
	{menelaus_web_buckets,
	checking_bucket_access,
	4},
	{menelaus_web,
	loop,
	3}]}]

and sine crash report:
[error_logger:error] [2012-06-26 0:14:51] [ns_1@192.168.70.204:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
	crasher:
	initial call: mb_master:init/1
	pid: <0.15029.1766>
	registered_name: mb_master
	exception exit: {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
	in function  gen_fsm:terminate/7
	ancestors: [ns_server_sup,ns_server_cluster_sup,<0.41.0>]
	messages: [{'$gen_event',
	{heartbeat,
[...]

and a aupervisor erro:

[error_logger:error] [2012-06-26 0:14:53] [ns_1@192.168.70.204:error_logger:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
	Supervisor: {local,ns_server_sup}
	Context:    child_terminated
	Reason:     {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
	Offender:   [{pid,<0.15029.1766>},
	{name,mb_master},
	{mfargs,{mb_master,start_link,[]}},
	{restart_type,permanent},
	{shutdown,infinity},
	{child_type,supervisor}]
	[ns_server:info] [2012-06-26 0:14:54] [ns_1@192.168.70.204:mb_master:mb_master:init:98] Starting as candidate. Peers: ['ns_1@192.168.70.204',
	'ns_1@192.168.70.227',
	'ns_1@192.168.70.228']
[ns_server:info] [2012-06-26 0:14:54] [ns_1@192.168.70.204:ns_config_rep:ns_config_rep:init:56] init pulling
	[ns_server:info] [2012-06-26 0:14:54] [ns_1@192.168.70.204:mb_master:mb_master:candidate:244] Changing master from undefined to 'ns_1@192.168.70.227'
[ns_doctor:error] [2012-06-26 0:14:52] [ns_1@192.168.70.204:<0.535.0>:ns_doctor:get_nodes:153] Error attempting to get nodes: {exit,
	{noproc,
	{gen_server,
	call,
	[ns_doctor,
	get_nodes]}}}
[ns_server:info] [2012-06-26 0:14:54] [ns_1@192.168.70.204:ns_node_disco_events:ns_node_disco_log:handle_event:46] ns_node_disco_log: nodes changed: ['ns_1@192.168.70.204',
	'ns_1@192.168.70.227',
	'ns_1@192.168.70.228']
[ns_server:info] [2012-06-26 0:14:54] [ns_1@192.168.70.204:ns_config_rep:ns_config_rep:do_pull:257] Pulling config from: 'ns_1@192.168.70.227'
	[stats:error] [2012-06-26 0:14:54] [ns_1@192.168.70.204:<0.10196.0>:stats_collector:handle_info:95] Exception in stats collector: {exit,
	{timeout,
	{gen_server,
	call,
	[{'couch_stats_reader-assets',
	'ns_1@192.168.70.204'},
	fetch_stats]}},
	[{gen_server,
	call,
	2},
	{couch_stats_reader,
	fetch_stats,
	1},
	{stats_collector,
	grab_all_stats,
	1},
	{stats_collector,
	handle_info,
	2},
	{gen_server,
	handle_msg,
	5},
	{proc_lib,
	init_p_do_apply,
	3}]}

and a bit later an exception occured

[stats:error] [2012-06-26 0:15:05] [ns_1@192.168.70.204:<0.10189.0>:stats_collector:handle_info:95] Exception in stats collector: {exit,
	{timeout,
	{gen_server,
	call,
	[{'couch_stats_reader-itunes',
	'ns_1@192.168.70.204'},
	fetch_stats]}},
	[{gen_server,
	call,
	2},
	{couch_stats_reader,
	fetch_stats,
	1},
	{stats_collector,
	grab_all_stats,
	1},
	{stats_collector,
	handle_info,
	2},
	{gen_server,
	handle_msg,
	5},
	{proc_lib,
	init_p_do_apply,
	3}]}

and after are-join we see that

[couchdb:info] [2012-06-26 9:18:32] [ns_1@192.168.70.204:<0.5857.2328>:couch_log:info:39] mccouch is listening on port 11213
[error_logger:error] [2012-06-26 9:18:32] [ns_1@192.168.70.204:error_logger:ale_error_logger_handler:log_msg:76] Error in process <0.5855.2328> on node 'ns_1@192.168.70.204' with exit value: {{badmatch,{error,closed}},[{mc_connection,respond,5},{mc_tap,'-process_tap_stream/5-fun-0-',8},{couch_btree,stream_kv_node2,8},{couch_btree,stream_kp_node,7},{couch_btree,fold,4},{couch_db,changes_since,5},{couch_db... 
 
 
[error_logger:error] [2012-06-26 9:18:32] [ns_1@192.168.70.204:error_logger:ale_error_logger_handler:log_msg:76] ** Generic server <0.1450.2328> terminating 
** Last message in was {'EXIT',<0.1449.2328>,
                               {{badmatch,{error,closed}},
                                [{mc_connection,respond,5},
                                 {mc_tap,'-process_tap_stream/5-fun-0-',8},
                                 {couch_btree,stream_kv_node2,8},
                                 {couch_btree,stream_kp_node,7},
                                 {couch_btree,fold,4},
                                 {couch_db,changes_since,5},
                                 {couch_db,fast_reads,2},
                                 {mc_tap,process_tap_stream,5}]}}
** When Server state == {state,
                            {<0.1450.2328>,mc_batch_sup},
                            simple_one_for_one,
                            [{child,undefined,mc_batch_sup,
                                 {mc_batch_sup,start_link_worker,[]},
                                 temporary,3600000,worker,[]}],
                            undefined,0,1,[],mc_batch_sup,[]}
** Reason for termination == 
** {{badmatch,{error,closed}},
    [{mc_connection,respond,5},
     {mc_tap,'-process_tap_stream/5-fun-0-',8},
     {couch_btree,stream_kv_node2,8},
     {couch_btree,stream_kp_node,7},
     {couch_btree,fold,4},
     {couch_db,changes_since,5},
     {couch_db,fast_reads,2},
     {mc_tap,process_tap_stream,5}]}
 
[error_logger:error] [2012-06-26 9:18:32] [ns_1@192.168.70.204:error_logger:ale_error_logger_handler:log_report:72] 
=========================CRASH REPORT=========================
  crasher:
    initial call: supervisor:mc_batch_sup/1
    pid: <0.1450.2328>
    registered_name: []
    exception exit: {{badmatch,{error,closed}},
                     [{mc_connection,respond,5},
                      {mc_tap,'-process_tap_stream/5-fun-0-',8},
                      {couch_btree,stream_kv_node2,8},
                      {couch_btree,stream_kp_node,7},
                      {couch_btree,fold,4},
                      {couch_db,changes_since,5},
                      {couch_db,fast_reads,2},
                      {mc_tap,process_tap_stream,5}]}
      in function  gen_server:terminate/6
    ancestors: [<0.1449.2328>,<0.1448.2328>]
    messages: []
    links: []
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 377
    stack_size: 24
    reductions: 150
  neighbours:
 
[error_logger:error] [2012-06-26 9:18:32] [ns_1@192.168.70.204:error_logger:ale_error_logger_handler:log_report:72] 
=========================SUPERVISOR REPORT=========================
     Supervisor: {local,mc_sup}
     Context:    child_terminated
     Reason:     {{badmatch,{error,closed}},
                  [{mc_connection,respond,5},
                   {mc_tap,'-process_tap_stream/5-fun-0-',8},
                   {couch_btree,stream_kv_node2,8},
                   {couch_btree,stream_kp_node,7},
                   {couch_btree,fold,4},
                   {couch_db,changes_since,5},
                   {couch_db,fast_reads,2},
                   {mc_tap,process_tap_stream,5}]}
     Offender:   [{pid,<0.517.2328>},
                  {name,mc_tcp_listener},
                  {mfargs,{mc_tcp_listener,start_link,[11213]}},
                  {restart_type,permanent},
                  {shutdown,2000},
                  {child_type,worker}]
 
[error_logger:info] [2012-06-26 9:18:32] [ns_1@192.168.70.204:error_logger:ale_error_logger_handler:log_report:72] 
=========================PROGRESS REPORT=========================
          supervisor: {local,mc_sup}
             started: [{pid,<0.5857.2328>},
                       {name,mc_tcp_listener},
                       {mfargs,{mc_tcp_listener,start_link,[11213]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]
 
[error_logger:error] [2012-06-26 9:18:32] [ns_1@192.168.70.204:error_logger:ale_error_logger_handler:log_msg:76] ** Generic server <0.534.2328> terminating 
** Last message in was {'EXIT',<0.533.2328>,
                               {{badmatch,{error,closed}},
                                [{mc_connection,respond,5},
                                 {mc_tap,'-process_tap_stream/5-fun-0-',8},
                                 {couch_btree,stream_kv_node2,8},
                                 {couch_btree,stream_kp_node,7},
                                 {couch_btree,fold,4},
                                 {couch_db,changes_since,5},
                                 {couch_db,fast_reads,2},
                                 {mc_tap,process_tap_stream,5}]}}
** When Server state == {state,
                            {<0.534.2328>,mc_batch_sup},
                            simple_one_for_one,
                            [{child,undefined,mc_batch_sup,
                                 {mc_batch_sup,start_link_worker,[]},
                                 temporary,3600000,worker,[]}],
                            undefined,0,1,[],mc_batch_sup,[]}
** Reason for termination == 
** {{badmatch,{error,closed}},
    [{mc_connection,respond,5},
     {mc_tap,'-process_tap_stream/5-fun-0-',8},
     {couch_btree,stream_kv_node2,8},
     {couch_btree,stream_kp_node,7},
     {couch_btree,fold,4},
     {couch_db,changes_since,5},
     {couch_db,fast_reads,2},
     {mc_tap,process_tap_stream,5}]}
 
[...]
[ns_server:info] [2012-06-26 9:18:32] [ns_1@192.168.70.204:<0.10041.0>:ns_port_server:log:161] memcached<0.10041.0>: Rubbish received on the backend stream. closing it

After that no more errors but all request to that faulty node returns empty results.

Top
  • Login or register to post comments
Sun, 07/08/2012 - 20:53
ingenthr
Offline
Joined: 03/16/2010
Groups:

Looking this over, I have no immediate answers as to why the re-added has been a problem. I'll see if I can get a colleague to look it over.

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker