Why did my Couchbase crash?

Joshua_Fox · May 1, 2019, 2:01pm

My Couchbase, running on Google Compute Engine, became completely unresponsive to Couchbase access and SSH . I restarted the server and Couchbase works again. How can I diagnose this?

At the time, Couchbase was building indexes that were previously created with defer:true.

I don’t think I was overloading the server. The Dashboard (last I checked) showed plenty of unused memory quota left from the 10 GB RAM; and only 1% of the disk is in use. (There are 2 (2 vCPUs).

The bucket has 1.7 million documents. 185 indexes were created. Almost all of these actually only index a small fractio of the documents, as they have a WHERE _type= filter. After restart, approx 70 indexes are not yet fully built; and the indexing GUI says “Warning: Cannot communicate with indexer process.”

The logs show “invalid memory address”, but I don’t know if that gets at the root cause.

2019-05-01T11:45:50.623+00:00 WARN CBAS.cbas restarting metakv.RunObserveChildren on /cbas/ due to error: Get http://127.0.0.1:8091/_metakv/cbas/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
2019-05-01T11:45:50.624+00:00 WARN CBAS.cbas restarting metakv.RunObserveChildren on /cbas/ due to error: Get http://127.0.0.1:8091/_metakv/cbas/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
2019-05-01T11:45:50.751+00:00 WARN CBAS.cbas TLS config has been refreshed by ns server
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x7f7ea2]

goroutine 72010 [running]:
main.postTLSConfig(0x0, 0x0)
goproj/src/github.com/couchbaselabs/cbas/cbas/security.go:115 +0x422
main.notifyTLSConfigChange()
goproj/src/github.com/couchbaselabs/cbas/cbas/security.go:87 +0x62
created by main.registerTLSRefreshCallback.func ...show
ns_log 000
ns_1@127.0.0.1
11:45:51 AM   Wed May 1, 2019
Couchbase Server has started on web port 8091 on node 'ns_1@127.0.0.1'. Version: "6.0.1-2037-enterprise".
menelaus_sup 001
ns_1@127.0.0.1
11:45:50 AM   Wed May 1, 2019
Shutting down bucket "fromds" on 'ns_1@127.0.0.1' for server shutdown
ns_memcached 000
ns_1@127.0.0.1
11:45:49 AM   Wed May 1, 2019
Shutting down bucket "metricsBucket" on 'ns_1@127.0.0.1' for server shutdown
ns_memcached 000
ns_1@127.0.0.1
11:45:48 AM   Wed May 1, 2019```

![failure|398x500](upload://uV6sDDQigiTD778z00xomrboLup.png)![failure|398x500](upload://uV6sDDQigiTD778z00xomrboLup.png)

mihir.kamdar · May 1, 2019, 5:07pm

Hi @Joshua_Fox

This error does not seem to be connected with the issue you are seeing. Can you please help us with the following information, so that we can investigate further :

What version of Couchbase server are you using ? Is it Community Edition or Enterprise Edition
I assume you have just 1 node in your cluster. What services have been initialized on this node, and what is the memory allocation.
Can you pls upload the logs from the couchbase server? It would be super useful to have the logs and analyze them.

raju · May 1, 2019, 9:01pm

@Joshua_Fox Apart from answering @mihir.kamdar questions above, can you also tell us
Why are you creating that many indexes? For 1.7 million documents 185 indexes seem too many. What is your use case?

Joshua_Fox · May 2, 2019, 5:54am

What version …
Dashboard shows “Enterprise Edition 6.0.1 build 2037 ‧ IPv4”
1 node in your cluster.
Yes. I am now in development and evaluation, not yet in production.
what is the memory allocation.
The Dashboard shows this:

Data Service Memory total quota (8.74 GB)
in use (2.64 GB) unused quota (6.1 GB) unallocated (0 B)

pls upload the logs from the couchbase server
How do I do this? My original posting shows all logs from the time of the crash.

Joshua_Fox · May 2, 2019, 6:08am

Currently, this just experimentation with a small fraction of the data ad indexes.

The actual use case may involve 0.5-1 billion documents. Certainly the application should not seize up under the current load, so understanding what happened is important.

As to the number of indexes: The application uses hundreds of dynamically defined queries. In the Google Datastore approach, indexes are needed for most new queries. Couchbase is more flexible, but I think that hundreds of indexes will still be needed. (Note that in a typical design which distinguishes “tables” with a type field, indexes will be per-“type”, reducing the burden on the indexing service but requiring more indexes.)

Joshua_Fox · May 2, 2019, 6:38am

Note another strange warning, about the disk, first occurring this morning. (There has been no load other than ongoing indexing since my original message.) This appears unrelated, so I put it in a separate question.

mihir.kamdar · May 9, 2019, 10:38pm

Hi @Joshua_Fox : You can find more details about uploading log files here: cbcollect_info | Couchbase Docs

Joshua_Fox · May 12, 2019, 6:33am

Thank you. I suggest that

Couchbase should stop when, for example, 3% of the disk remains.
When the disk fills up, Couchbase clearly shows “disk full” with errors, not just the (valuable) advance warning.

This is because Couchbase fills the disk without active intervention (because of indexing).

After the disk is full, the system is borked.

At least Couchbase should stop at a recoverable state.

Topic		Replies	Views
Why is my Couchbase Indexer failing? Couchbase Server n1ql , index	8	3494	May 14, 2019
Couchbase server Scale opinions after crash Couchbase Server	2	973	January 7, 2019
Indexer crashing Couchbase Server	3	117	May 5, 2025
Memory issue with couchbase server(4.5.0-2601 Community Edition) Couchbase Server connections , java , index , timeout	0	1059	January 2, 2021
Couchbase pod restarts because of memcached Kubernetes	5	1341	April 30, 2021

Why did my Couchbase crash?

Related topics