[MB-4006] mb_mnesia,ensure_schema timeout when restarting ns_server Created: 22/Jun/11  Updated: 09/Apr/13  Resolved: 08/Apr/13

Status: Resolved
Project: Couchbase Server
Component/s: ns_server
Affects Version/s: 1.7 GA, 1.7.2
Fix Version/s: None
Security Level: Public

Type: Bug Priority: Critical
Reporter: Farshid Ghods (Inactive) Assignee: Aleksey Kondratenko
Resolution: Incomplete Votes: 1
Labels: 1.7.0-release-notes, 1.8.0-release-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment: 1.7 GA

Attachments: File log.rar    

 Description   
users are run into this issue too often which i suspect has sth to do with the uptime or maybe an unclean shutdown of the server can result in this behavior.

asking customers to delete their statistics is not a very pleasent thing to do



 Comments   
Comment by Farshid Ghods (Inactive) [ 22/Jun/11 ]
When a process is killed or dies, mnesia files might be left in a poor state that prevents restart.
Comment by Farshid Ghods (Inactive) [ 30/Jun/11 ]
the workaround is to remove all files under /opt/membase/var/lib/membase/mnesia/*.*
Comment by Aleksey Kondratenko [ 05/Jul/11 ]
fix merged
Comment by Farshid Ghods (Inactive) [ 17/Apr/12 ]
reported by a user on 1.7.2
diags here : http://www63.zippyshare.com/v/19241756/file.html
Comment by Aleksey Kondratenko [ 02/May/12 ]
thanks for raising this again.

Looks like our workaround is not working all the time. For now manual workaround of deleting mnesia files and starting ns_server again will work.

For future we'll get rid of mnesia completely.
Comment by Aleksey Kondratenko [ 02/May/12 ]
Dipti, maybe I'm missing something and stats-in-mnesia is not "unfixable crap". But my plan since before 1.7.0 was to replace saving stats in mnesia with much simpler periodic snapshotting into plain file. Because stats are not really precious thing.

I estimate this work to be around 1-2 days including thorough testing.

We also had some plans for storing stats in couch, but given KISS principle I think my simplified approach is what we need.

Please, decide when we want this work to be done.
Comment by Aleksey Kondratenko [ 02/May/12 ]
Let me explain "not really precious thing" better. Apparently it's too easily to misunderstand my original phrase as "we shouldn't save stats at all".

Mnesia does it's own journalling and crash recovery and durable commits and stuff like that. Or supposedly does.

My point was we don't need _that_ level of durability. That means that instead of trying to commit supposedly durable transaction via mnesia after each stats sample we'll accumulate changes and save them once or twice per minute. Saving itself (because it's not much data) can be made durable and cheap.

Comment by Daniel [ 02/May/12 ]
Attaching the relevant logs.
Comment by Aleksey Kondratenko [ 08/Apr/13 ]
Invalidated by removal of mnesia
Comment by Dipti Borkar [ 08/Apr/13 ]
have you already removed mnesia?

If so is there any impact in the upgrade process ?
Comment by Aleksey Kondratenko [ 09/Apr/13 ]
The code was actually ready for 2.0.1 but sadly too late. So 2.0.2 got it early.

Impact on upgrade process is simple. People will lose their stats archive. I.e. we don't bother converting stats from mnesia to new format. But note that rolling upgrade already causes them to lose historical stats.
Generated at Wed Dec 17 22:17:34 CST 2014 using JIRA 5.2.4#845-sha1:c9f4cc41abe72fb236945343a1f485c2c844dac9.