Couchbase crashes at specific time (2.2.0 community edition)

tech_johan · June 23, 2025, 3:22pm

Hi all,

I have “inherited” a rather complex (and old) IT environment consisting of one database server (Oracle), three application servers, and two webservers. One app server runs Couchbase, one FAST ESP, and one Luke (Lucene). These three app servers feed two different websites on the two webservers.

I have an urgent issue where the Couchbase server crashes at exactly 2 AM. The service is then unavailable until next day at exactly 12 AM. The strange thing is that the crash window moves when we go from standard time to daylight saving time. During standard time the windows is from 1 AM to 11 AM. After the “crash window”, I restart Couchbase and everything is blistering fast again.

I have searched everywhere for a clue on this issue but can’t find anything. I have also performed a rather deep troubleshooting, but can’t find any clue to this behavior.

In Couchbase there is one Webcache bucket and one Memcached bucket serving the two websites in different ways.

I’m in no way a Couchbase expert (rather newbie), but I’m experienced in IT. As a bonus, the IT environment is mostly undocumented. I stumble in the dark here.

Has anyone of you heard of a similar issue? Do you have any tips on further troubleshooting?

Many thanks in advance!

Cheers, Johan

mreiche · June 23, 2025, 3:37pm

What do you have from the crash? Messages? Logs? Core?

tech_johan · June 23, 2025, 4:04pm

Thanks for replying! I have a lot of information, in fact, I have so much that I don’t know where to begin. The first second of the crash in the error file is 609 rows. Is it okay to post that much text here? Should I post it in Blockquote or attach a textfile? Thanks again.

mreiche · June 23, 2025, 4:31pm

Try posting as a file. If that doesn’t work, post inline

mreiche · June 23, 2025, 7:59pm

Wouldn’t restarting manually restore service?
You said it crashes at 2am and is unavailable until 12am. So it’s only available the two hours from 12am to 2am? (or did you mean it is unavailable until 12pm (noon)?).
You might want to check for something scheduled to run at 2am. Or clients at 2am.
Or if you could track down what changed the same time as the 2am crashes started.

tech_johan · June 23, 2025, 8:28pm

Sorry, I mean 12PM. I’m in Sweden so not used to AM/PM. The downtime is always ten hours exactly. It doesn’t help to restart the Couchbase service, and if it is running at 12PM I need to restart it for Couchbase to come alive. After the restart it only takes seconds for the websites to be blistering fast again.

I have uploaded a zip file with a snippet from the error file in var\lib\couchbase\logs containing the first second of the crash.

Thanks for your help,
Johan
Couchbase_errorlog_snippet_250623_0200.zip (3.0 KB)

mreiche · June 23, 2025, 8:52pm

If I search jira tickets for that error I find these.

The crash report looks somewhat like Jira

[stats:error,2025-06-23T2:00:02.317,ns_1@127.0.0.1:<0.12736.0>:stats_collector:handle_info:106]Exception in stats collector: {exit,
                                  {{badmatch,{error,closed}},
                                   {gen_server,call,
                                       ['ns_memcached-default',
                                        {stats,<<>>},
                                        180000]}},
                                  [{gen_server,call,3},
                                   {ns_memcached,do_call,3},
                                   {stats_collector,grab_all_stats,1},
                                   {stats_collector,handle_info,2},
                                   {gen_server,handle_msg,5},
                                   {proc_lib,init_p_do_apply,3}]}

tech_johan · June 23, 2025, 9:07pm

Thanks, I will look into that. Is there a scheduler in Couchbase that I haven’t been able to find?

mreiche · June 23, 2025, 9:24pm

Auto-compaction - couchbase-manual-2.2 or command-line couchbase-manual-2.2

mreiche · June 23, 2025, 9:30pm

This sounds suspect …

" The server has a process that will periodically scan every key in RAM and compile them into a log, named access.log as well as maintain a backup of this access log, named access.old . The server can use this backup file during warmup if the most recent access log has been corrupted during warmup or node failure. By default this process runs initially at 2:00 GMT and will run again in 24- hour time periods after that point. You can configure this process to run at a different initial time and at a different fixed interval."

You could try deleting the access.log and access.old in case the problem is that they are corrupt. You could also try setting the a_log_sleep_time to a large value like 525600 (a year).

tech_johan · June 24, 2025, 4:00pm

Thanks, that really looks interesting. It didn’t help to delete the access.log file. Now trying to see what I can find regarding the timing parameters. I can’t find that they are set today, which should mean 2AM UTC according to the manual. That is 4AM here in Sweden so it doesn’t line up with the crash at 2AM.

mreiche · June 24, 2025, 5:24pm

Maybe at one point someone set the start time to 4AM Sweden time?

mreiche · June 25, 2025, 5:10pm

Looked in memcached log files to see it it did print out anything
if this is Linux, ensure that memcached can create a corefile and just collect a callstack from that. To do so one can just change the startup script (couchbase-server.sh) and add something like ulimit -c unlimited and wait for the next crash to appear. then use gdb /opt/couchbase/bin/memcached corefile and in there execute thread apply all bt

system · September 23, 2025, 5:11pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Couchbase 2.5.0 "CRASH REPORT"s with subsequent connection timeouts Couchbase Server	2	2140	May 1, 2015
Crash on couchbase-server 2.2 start Couchbase Server	5	4323	July 16, 2014
Memcached crash every 10 minutes Couchbase Server	4	3596	March 27, 2017
Total crash of the software Couchbase Server	0	1403	January 23, 2018
Couchbase 6.0.2 crashes multiple times Couchbase Server	2	605	May 25, 2020

Couchbase crashes at specific time (2.2.0 community edition)

Related topics