Couchbase
  • Why NoSQL?
  • Couchbase Server
  • Download
  • Resources
  • Careers
Home | Forums | Membase | Membase Server 1.6.x

Process Crash Loop.

5 replies [Last post]
  • Login or register to post comments
Fri, 04/22/2011 - 12:37
wstiern
Offline
Joined: 03/11/2011
Groups: None

Need some help with logfile analysis. Looks like communications issues of some sort are causing processes to crash and restart constantly. Lots of the following bits in logs on all nodes:

ERROR REPORT  <5597.5916.936>                               2011-04-22 13:09:44
===============================================================================
 
ns_1@10.101.5.181:ns_memcached:374: Unable to connect: {error,
                                                        {badmatch,
                                                         {error,
                                                          econnrefused}}}, retrying.

This is repeated a bajillion times, then:

ERROR REPORT  <5597.5815.936>                               2011-04-22 13:09:44
===============================================================================
 
** Generic server <5597.5815.936> terminating 
** Last message in was {#Port<5597.28202070>,{exit_status,134}}
** When Server state == {state,#Port<5597.28202070>,memcached,
                               {["memcached: stored-value.hh:974: add_type_t HashTable::add(const Item&, bool, bool): Assertion `v->isDirty() == isDirty' failed.",
                                 "Item was expired at load:  530734zMCEeNfJs85z"],
                                ["Item was expired at load:  530734klPJrGH0Hsjc"]},
                               {ok,{1303492184891651,
                                    #Ref<5597.0.1973.200420>}},
                               ["Item was expired at load:  533406XCeVuIJBMPNF",
                                "Item was expired at load:  533406twnwhPE4OhiC",
                                "Item was expired at load:  5313992tePTLISLPLM",
                                "Item was expired at load:  531399LyWwvVzKuYw2",
                                "Item was expired at load:  533406Q08ygjQkH6vN",
                                "Item was expired at load:  533406cZj1hVJqXOeB",
                                "Item was expired at load:  533406QpeJ8WaU7vM2",
                                "Item was expired at load:  533406fuRZwT6zPiHM",
                                "Item was expired at load:  533406EODkYgy4ipxC",
                                "Item was expired at load:  531399c45mzSAJcPD3"],
                               2394}
** Reason for termination == 
** {abnormal,134}
 
CRASH REPORT  <5597.5815.936>                               2011-04-22 13:09:44
===============================================================================
Crashing process                                                               
   initial_call                           {ns_port_server,init,['Argument__1']}
   pid                                                          <5597.5815.936>
   registered_name                                                           []
   error_info
         {exit,{abnormal,134},
              [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}
   ancestors
         [<5597.5814.936>,ns_port_sup,ns_server_sup,ns_server_cluster_sup,
         <5597.52.0>]
   messages                              [{'EXIT',#Port<5597.28202070>,normal}]
   links                                                      [<5597.5814.936>]
   dictionary                                                                []
   trap_exit                                                               true
   status                                                               running
   heap_size                                                               6765
   stack_size                                                                24
   reductions                                                            440446
 
INFO REPORT  <5597.5814.936>                                2011-04-22 13:09:44
===============================================================================
 
Cushion managed supervisor for memcached failed:  {abnormal,134}
 
ERROR REPORT  <5597.5814.936>                               2011-04-22 13:09:44
===============================================================================
 
** Generic server <5597.5814.936> terminating 
** Last message in was {die,{error,cushioned_supervisor,{abnormal,134}}}
** When Server state == {state,memcached,5000,{1303,492177,453890},undefined}
** Reason for termination == 
** {error,cushioned_supervisor,{abnormal,134}}
 
CRASH REPORT  <5597.5814.936>                               2011-04-22 13:09:44
===============================================================================
Crashing process                                                               
   initial_call                       {supervisor_cushion,init,['Argument__1']}
   pid                                                          <5597.5814.936>
   registered_name                                                           []
   error_info
         {exit,{error,cushioned_supervisor,{abnormal,134}},
              [{gen_server,terminate,6},{proc_lib,init_p_do_apply,3}]}
   ancestors      [ns_port_sup,ns_server_sup,ns_server_cluster_sup,<5597.52.0>]
   messages                                                                  []
   links                                                         [<5597.106.0>]
   dictionary                                                                []
   trap_exit                                                               true
   status                                                               running
   heap_size                                                                377
   stack_size                                                                24
   reductions                                                               216

Then we get scads of supervisor reports. Can I send you some logs, Perry?

Top
  • Login or register to post comments
Fri, 04/22/2011 - 15:23
perry
Offline
Joined: 10/11/2010
Groups:

This line points to a bug fixed in 1.6.5.3:

{["memcached: stored-value.hh:974: add_type_t HashTable::add(const Item&, bool, bool): Assertion `v->isDirty() == isDirty' failed.",

What version are you running?

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Fri, 04/22/2011 - 16:23
wstiern
Offline
Joined: 03/11/2011
Groups: None

1.6.5.

Time for an upgrade?

Top
  • Login or register to post comments
Fri, 04/22/2011 - 16:26
perry
Offline
Joined: 10/11/2010
Groups:

Indeed.

While you're at it, take a look at our 1.7 pre-release and let me know what you think: http://techzone.couchbase.com/forums/thread/membase-server-17-developer-...

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
Mon, 04/25/2011 - 06:38
wstiern
Offline
Joined: 03/11/2011
Groups: None

Perry,

Seeing the below error every time I try to remove/rebalance a node:

INFO REPORT  <5597.28355.1061>                              2011-04-25 09:21:00
===============================================================================
 
ns_1@10.101.5.181:ns_rebalancer:420: Waiting for ['ns_1@10.101.5.182',
                                                  'ns_1@10.101.5.183',
                                                  'ns_1@10.101.5.184',
                                                  'ns_1@10.101.5.185',
                                                  'ns_1@10.101.5.186']
 
[previous message repeated every second]
 
INFO REPORT  <5597.157.0>                                   2011-04-25 09:21:07
===============================================================================
 
ns_log: logging ns_orchestrator:2:Rebalance exited with reason wait_for_memcached_failed

Looks like a simple timeout, but it's unshakeable. Hard to upgrade the cluster when I can't remove nodes. Any assistance you could lend would be appreciated.

Top
  • Login or register to post comments
Mon, 04/25/2011 - 10:33
perry
Offline
Joined: 10/11/2010
Groups:

If one node is continuously restarting (as per your previous error message) you won't be able to rebalance it out of the cluster since we can't pull the necessary data off of it.

You're best option would be to fail that node over, upgrade it and add it back to the cluster.

If there are more nodes crashing than you have replicas available, you'll have to do an "in place" upgrade which means shutting all the nodes down, upgrading them and restarting.

Make sense? Make sure to check out the release notes and upgrade instructions for 1.6.5.3: http://techzone.couchbase.com/wiki/display/membase/Membase+Server+1.6.5.3

Perry

__________________

Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!

Top
  • Login or register to post comments
  • Login or register to post comments
  • Login
  • Register

Company

  • About Us
  • Leadership
  • Customers
  • Partners
  • Contact Us

Product

  • Couchbase Server
  • Couchbase SDKs
  • Use Cases
  • Documentation
  • Forums

Open Source

  • Couchbase Project
  • Couchbase vs. CouchDB

Commercial

  • Subscriptions & Support
  • Training & Services

News

  • Blog
  • Newsletter
  • Press Releases
  • Buzz

Follow Us

    
  • Customer Login
  • Terms of Service
  • Privacy Policy
  • Trademark Policy
  • Site Map

© 2013 COUCHBASE All rights reserved.

Sign in to Couchbase Community

close
  • Create new account
  • Request new password
You are logging into the Forums, Wiki and Issue Tracker