[MB-4765] Erlang dump created on membase server restart Created: 23/Jan/12  Updated: 13/May/12  Resolved: 25/Apr/12

Status: Resolved
Project: Couchbase Server
Component/s: None
Affects Version/s: 1.7.2, 1.8.0
Fix Version/s: 1.8.1
Security Level: Public

Type: Bug Priority: Major
Reporter: James Mauss Assignee: Aleksey Kondratenko
Resolution: Fixed Votes: 0
Labels: 1.8.1-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 1.7.2

Attachments: File erl_crash.dump     Text File JDPCachePrimary_01112012a.log    

 Description   
After a recent OS update and reboot, the 2-server clusters did not reload from disk successfully, and a membase-server restart results in an erlang error.

When the cluster was warming up, both nodes are responding to a membase-server restart command with "{"init terminating in do_boot",{{badmatch,{error,{shutdown,{ns_server,start,[normal,[]]}}}},[{init,start_it,1},{init,start_em,1}]}}

Erlang has closed /opt/membase/lib/erlang/lib/os_mon-2.2.6/priv/bin/memsup: Erlang has closed.
Crash dump was written to: erl_crash.dump

Once the warmup finished the cluster was back in a working state.

Logs and crash file attached.

 Comments   
Comment by James Mauss [ 03/Feb/12 ]
The customer would like to know why the erlang dump was being created.
Comment by Aleksey Kondratenko [ 03/Feb/12 ]
another copy of erlang was still running.

We have known issue fixed in 2.0 that initscript stop action merely sends shutdown signal to ns_server without waiting for actual shutdown. Actual shutdown waits until memcached ends persisting it's data. So may take time. Thus initscript restart doesn't really work in most real world cases in 1.7 and current 1.8.
Comment by Aleksey Kondratenko [ 03/Feb/12 ]
added pivotal story to address that for 1.8.1 https://www.pivotaltracker.com/projects/212245
Comment by Aleksey Kondratenko [ 25/Apr/12 ]
Pivotal link is broken.

Anyway. This is done. 1.8.1 has reliable shutdown backported from 2.0
Generated at Thu Sep 18 22:33:39 CDT 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.