Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Couchbase | Couchbase Server 2.0

Node down, impossible to get it back up

2 replies [Last post]
  • Login or register to post comments
Fri, 12/23/2011 - 03:45
vingab
Offline
Joined: 12/23/2011
Groups: None

Hi all,

We have run into an issue overnight.

First off, our config is :
- We only have 1 node (4GB of RAM) and 1 bucket ('default' bucket)
- We run Couchbase 2.0 Dev Preview 3
- The server is homed in a large Amazon EC2 instance (2 virtuals cores)

In our Couchbase Manager interface (localhost:8091, obtained after a "/etc/init.d/couchbase-server start" command), our active server is "down" and we can't understand why we can't access to the data in the bucket.

The logs that were obtained from : var/lib/couchbase/logs/errors :

stats:error] [2011-12-23 10:17:51] [ns_1@127.0.0.1:<0.1900.9>:stats_reader:log_bad_responses:191] Some nodes didn't respond: ['ns_1@127.0.0.1']
[error_logger:error] [2011-12-23 10:17:52] [ns_1@127.0.0.1:error_logger:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
     Supervisor: {local,menelaus_sup}
     Context:    child_terminated
     Reason:     {noproc,{gen_server,call,
                                     ['ns_memcached-default',topkeys,30000]}}
     Offender:   [{pid,<0.1599.9>},
                  {name,hot_keys_keeper},
                  {mfargs,{hot_keys_keeper,start_link,[]}},
                  {restart_type,permanent},
                  {shutdown,5000},
                  {child_type,worker}]
 
 
[ns_server:error] [2011-12-23 10:17:58] [ns_1@127.0.0.1:<0.7176.9>:ns_janitor:do_cleanup:68] Bucket "default" not yet ready on ['ns_1@127.0.0.1']
[stats:error] [2011-12-23 10:17:57] [ns_1@127.0.0.1:<0.1900.9>:stats_reader:log_bad_responses:191] Some nodes didn't respond: ['ns_1@127.0.0.1']
[error_logger:error] [2011-12-23 10:18:02] [ns_1@127.0.0.1:error_logger:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
     Supervisor: {local,menelaus_sup}
     Context:    child_terminated
     Reason:     {noproc,{gen_server,call,
                                     ['ns_memcached-default',topkeys,30000]}}
     Offender:   [{pid,<0.7194.9>},
                  {name,hot_keys_keeper},
                  {mfargs,{hot_keys_keeper,start_link,[]}},
                  {restart_type,permanent},
                  {shutdown,5000},
                  {child_type,worker}]
 
 
[stats:error] [2011-12-23 10:18:00] [ns_1@127.0.0.1:<0.18155.5>:stats_reader:log_bad_responses:191] Some nodes didn't respond: ['ns_1@127.0.0.1']
[ns_server:error] [2011-12-23 10:18:07] [ns_1@127.0.0.1:xdc_rep_manager:xdc_rep_manager:handle_info:263] database update notifier died. Reason: killed
[error_logger:error] [2011-12-23 10:18:07] [ns_1@127.0.0.1:error_logger:ale_error_logger_handler:log_msg:76] Error in process <0.16293.9> on node 'ns_1@127.0.0.1' with exit value: {{badmatch,{ok,<<295271 bytes>>}},[{couch_file,read_raw_iolist_int,3},{couch_file,maybe_read_more_iolist,4},{couch_file,reader_loop,1}]}
[stats:error] [2011-12-23 10:18:05] [ns_1@127.0.0.1:<0.7201.9>:stats_reader:log_bad_responses:191] Some nodes didn't respond: ['ns_1@127.0.0.1']
[ns_server:error] [2011-12-23 10:18:16] [ns_1@127.0.0.1:<0.16329.9>:ns_janitor:do_cleanup:68] Bucket "default" not yet ready on ['ns_1@127.0.0.1']
[stats:error] [2011-12-23 10:18:10] [ns_1@127.0.0.1:<0.10559.9>:stats_reader:log_bad_responses:191] Some nodes didn't respond: ['ns_1@127.0.0.1']
[error_logger:error] [2011-12-23 10:18:17] [ns_1@127.0.0.1:error_logger:ale_error_logger_handler:log_msg:76] ** Generic server xdc_rep_manager terminating
** Last message in was {'EXIT',<0.1625.9>,killed}
** When Server state == {rep_db_state,<0.1624.9>,<<"_replicator">>,<0.1625.9>}** Reason for termination ==
** {timeout,
       {gen_server,call,
           ['logger-ns_server',
            {log,
                {log_info,ns_server,error,xdc_rep_manager,handle_info,263,
                    {1324,635487,988595},
                    xdc_rep_manager,'ns_1@127.0.0.1',undefined},
                "database update notifier died. Reason: ~p",
                [killed]}]}} 

When trying to access a document that we know exists in the database (with key "list_sessions"), the following error is brought up :

Memcached::ProtocolError: Key {"fb:50"=>"127.0.0.1:11211:8"}
	from /usr/local/ruby/lib/ruby/gems/1.9.1/gems/memcached-1.3.5/lib/memcached/memcached.rb:613:in `reraise'
	from /usr/local/ruby/lib/ruby/gems/1.9.1/gems/memcached-1.3.5/lib/memcached/memcached.rb:593:in `check_return_code'
	from /usr/local/ruby/lib/ruby/gems/1.9.1/gems/memcached-1.3.5/lib/memcached/memcached.rb:512:in `get'
	from /usr/local/ruby/lib/ruby/gems/1.9.1/gems/couchbase-0.9.7/lib/couchbase/memcached.rb:130:in `get'
	from /home/funkytown/funkytown/app/models/user.rb:68:in `find'
	from (irb):11
	from /usr/local/ruby/lib/ruby/gems/1.9.1/gems/railties-3.1.3/lib/rails/commands/console.rb:45:in `start'
	from /usr/local/ruby/lib/ruby/gems/1.9.1/gems/railties-3.1.3/lib/rails/commands/console.rb:8:in `start'
	from /usr/local/ruby/lib/ruby/gems/1.9.1/gems/railties-3.1.3/lib/rails/commands.rb:40:in `<top (required)>'
	from script/rails:6:in `require'
	from script/rails:6:in `<main>'

The problem seems to have occurred last night, time from which the logs start to look odd. On a normal basis, the logs of the couchbase interface often show a series of shutting downs of the bucket followed by loadings of the same bucket. This time however, only shutting downs take place as you can see :

2011-12-22 20:09:49.964 ns_cookie_manager:2:info:cookie update(ns_1@127.0.0.1) - Node 'ns_1@127.0.0.1' synchronized otp cookie jqktjyrvlnxeahky from cluster
2011-12-22 20:09:50.034 menelaus_sup:1:info:web start ok(ns_1@127.0.0.1) - Couchbase Server has started on web port 8091 on node 'ns_1@127.0.0.1'.
2011-12-22 20:09:55.629 ns_memcached:2:info:message(ns_1@127.0.0.1) - Shutting down bucket "default" on 'ns_1@127.0.0.1' for server shutdown
2011-12-22 20:15:50.917 ns_memcached:2:info:message(ns_1@127.0.0.1) - Shutting down bucket "default" on 'ns_1@127.0.0.1' for server shutdown (repeated 2 times)
2011-12-22 20:15:58.915 ns_memcached:2:info:message(ns_1@127.0.0.1) - Shutting down bucket "default" on 'ns_1@127.0.0.1' for server shutdown
2011-12-22 20:21:49.951 ns_memcached:2:info:message(ns_1@127.0.0.1) - Shutting down bucket "default" on 'ns_1@127.0.0.1' for server shutdown (repeated 3 times)
2011-12-22 20:22:49.325 ns_memcached:2:info:message(ns_1@127.0.0.1) - Shutting down bucket "default" on 'ns_1@127.0.0.1' for server shutdown
2011-12-22 20:27:49.956 ns_memcached:2:info:message(ns_1@127.0.0.1) - Shutting down bucket "default" on 'ns_1@127.0.0.1' for server shutdown (repeated 1 times)

Do you have an idea on why this suddenly happened ?

Any help you could give us would be very much appreciated. We would like to know where this could come from, seeing as we intend on using the service for our casual game application, and reinstalling Couchbase everytime is not a viable option.

Thank you for your attention,

Vincent

Top
  • Login or register to post comments
Thu, 12/29/2011 - 19:17
ingenthr
Offline
Joined: 03/16/2010
Groups:

First off, thanks so much for the detailed report.

I don't immediately know of any issues that would cause this problem. Anything that is causing a protocol violation (as seen from the ruby client) is definitely an issue we want to look into. I've just filed:
http://www.couchbase.org/issues/browse/MB-4600

Is there any chance you can get a packet capture of port 11211, port 11210, port 8091 when it's in a failed state while you try to make the same request from the Ruby client? This may give us some clues as to what's happening at the protocol level. You should be able to do this with tcpdump.

If you can, please attach it to that issue above.

Top
  • Login or register to post comments
Mon, 01/02/2012 - 07:50
alkondratenko
alkondratenko's picture
Offline
Joined: 12/01/2010
Groups: None

We need fuller logs to diagnose your case. This could be something as simple as end of disk space. Or something more interesting. Can you grab diagnostics via clicking on "Log" in top menu and then "Generate Diagnostic Report" ?

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker