Couchbase 2.5.0 "CRASH REPORT"s with subsequent connection timeouts

Couchbase version: 2.5.0
RAM Overview: Total Allocated (1 GB) Total in Cluster (16 GB) In Use (33.8 MB) Unused (990 MB) Unallocated (15 GB)
Disk Overview: Usable Free Space (40.6 GB)Total Cluster Storage (44.1 GB) In Use (82.5 MB) Other Data (3.45 GB) Free (40.6 GB)
Active Servers: 1 Servers Failed Over: 0 Servers Down: 0 Servers Pending Rebalance: 0
Number of buckets: 1 Item Count: 5369 Number of views: 4

Our Couchbase server is logging unexpected "CRASH REPORT"s (below) and our app server is reporting connection time outs when trying to connect to it for the next 8 to 12 hours.

During the apparent crash, these logs are generated exactly every 5 seconds, for about 6 minutes. No suspicious logs precede the crash. No errors appear in the log during this down time, although other messages like “Compacting” and “data_size” continue to be logged.

What is happening and how do we prevent it? Are “timeout,” “gen_server,” and “dir_size” any hint about what is happening?

    [ns_server:info,2015-04-27T8:37:36.714,ns_1@<0.17870.785>:compaction_daemon:try_to_cleanup_indexes:650]Cleaning up indexes for bucket `mystuff`
[ns_server:info,2015-04-27T8:37:36.723,ns_1@<0.17870.785>:compaction_daemon:spawn_bucket_compactor:609]Compacting bucket mystuff with config:
[error_logger:error,2015-04-27T8:37:44.971,ns_1@<0.6.0>:ale_error_logger_handler:log_msg:76]** Generic server 'couch_stats_reader-mystuff' terminating
** Last message in was refresh_stats
** When Server state == {state,"mystuff",1430123854959,
** Reason for termination ==
** {timeout,{gen_server,call,

=========================CRASH REPORT=========================
    initial call: couch_stats_reader:init/1
    pid: <0.7621.689>
    registered_name: 'couch_stats_reader-mystuff'
    exception exit: {timeout,
      in function  gen_server:terminate/6
    ancestors: ['single_bucket_sup-mystuff',<0.6601.0>]
    messages: [refresh_stats]
    links: [<0.6602.0>,<0.6447.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 6765
    stack_size: 24
    reductions: 3301239623

=========================SUPERVISOR REPORT=========================
     Supervisor: {local,'single_bucket_sup-mystuff'}
     Context:    child_terminated
     Reason:     {timeout,
     Offender:   [{pid,<0.7621.689>},

=========================PROGRESS REPORT=========================
          supervisor: {local,'single_bucket_sup-mystuff'}
             started: [{pid,<0.17936.785>},


Can you tell me a bit more about the operations that are timing out? What are they trying to do?

Also, what does your disk set-up look like? Are you running from a SAN or perhaps in a cloud provider without provisioned iops?



I checked our /var/log/syslog and discovered that our NFS server reported being down at the exact same 6 minute timespan. There were no external queries at that moment. I don’t know where we have our Couchbase instance pointed at NFS mounts, but, other than the opaque error, this is certainly not a Couchbase problem.

Thank you for having a look at this!