Couchbase pod restarts because of memcached

we are running a 5 node couchbase cluster in kubernetes environment. We have a persistent volume attached to each node through an external GlusterFs storage server. After 2 days of load run, one of the couchbase server pod has restarted. From analysing the logs of the crash time, it looks to be due to ‘memcached’ restart. But I am not able to figure out why memcached would have restarted. The relevant log snippets are as below:

couchbase-0 pod got restarted on 23-Apr-2021, 18:39 UTC. On analysing the logs at that time, I infer that the memcached has crashed which should have triggerd the pod to restart. Some log snippets as below:

debug.log
149917 [ns_server:debug,2021-04-23T18:39:49.854Z,ns_1@cas-couchbase-0.cas-couchbase-service:memcached_config_mgr <0.903.0>:memcached_config_mgr:init:48]ns_ports_setup seems to be ready
149918 [user:debug,2021-04-23T18:39:49.896Z,ns_1@cas-couchbase-0.cas-couchbase-service: <0.629.0>:ns_log:crash_consumption_loop:70] Service ‘memcached’ exited with status 0. Restarting. Messages:
149919 2021-04-23T18:39:49.241725+00:00 WARNING 199: Invalid password specified for [@ns_server] UUID:[70ec218d-dc1a-470b-72c1-1aa8b8fb0ae4]
149920 2021-04-23T18:39:49.343462+00:00 WARNING 200: Invalid password specified for [@ns_server] UUID:[56b4b926-72a6-431f-b2e0-e224a5d0aa13]
149921 2021-04-23T18:39:49.352313+00:00 WARNING 201: Invalid password specified for [@ns_server] UUID:[45558727-8cf3-405b-6b0b-9b64f8034559]
149922 2021-04-23T18:39:49.383944+00:00 WARNING 202: Invalid password specified for [@ns_server] UUID:[e8036490-b088-4558-3002-139dc1153bef]
149923 2021-04-23T18:39:49.555734+00:00 WARNING 203: Invalid password specified for [@ns_server] UUID:[13928919-66de-4d23-37dd-3b9d9320b88f]
149924 2021-04-23T18:39:49.669401+00:00 WARNING 203: Invalid password specified for [@ns_server] UUID:[9011fbe2-925e-4578-cc55-9b632407b534]
149925 2021-04-23T18:39:49.785833+00:00 WARNING 204: Invalid password specified for [@ns_server] UUID:[fa6c7f21-3377-40c9-1aab-3730709ed6e9]

149926 EOL on stdin. Initiating shutdown
reports.log
7449 [error_logger:error,2021-04-23T18:39:45.984Z,ns_1@cas-couchbase-0.cas-couchbase-service:error_logger <0.6.0>:ale_error_logger_handler:do_log:203]
7450 =========================CRASH REPORT=========================
7451 crasher:
7452 initial call: gen_event:init_it/6
7453 pid: <0.287.0>
7454 registered_name: bucket_info_cache_invalidations
7455 exception exit: killed
7456 in function gen_event:terminate_server/4 (gen_event.erl, line 320)
7457 ancestors: [bucket_info_cache,ns_server_sup,ns_server_nodes_sup,
7458 <0.168.0>,ns_server_cluster_sup,<0.89.0>]
7459 messages:
7460 links:
7461 dictionary:
7462 trap_exit: true
7463 status: running
7464 heap_size: 376
7465 stack_size: 27
7466 reductions: 133
7467 neighbours:
debug.log
114651 [ns_server:warn,2021-04-23T18:39:10.789Z,nonode@nohost:dist_manager<0.141.0>:dist_manager:wait_for_address:118] Could not resolve address cas-couchbase-0.cas-couchbase-service: nxdomain
145240 [ns_server:warn,2021-04-23T18:39:40.630Z,ns_1@cas-couchbase-0.cas-couchbase-service:memcached_refresh <0.173.0>:ns_memcached:connect:1227] Unable to connect: {error,{badmatch,{error,econnrefused}}}.

Indexer.log
52870 2021-04-23T18:39:43.111+00:00 [Warn] Indexer Failure to Init Get http://127.0.0.1:8091/_metakv/indexing/settings/config: Unable to find given hostport in cbauth database: `127.0.0.1:8091’
52871 2021-04-23T18:39:43.111+00:00 [Info] Indexer exiting normally

Info.log
292399 [ns_server:warn,2021-04-23T18:39:38.625Z,ns_1@cas-couchbase-0.cas-couchbase-service:memcached_refresh <0.173.0>:ns_memcached:connect:1227]Unable to connect: {error,{badmatch,{error,econnrefused}}}.

query.log
134060 _time=2021-04-23T18:39:48.572+00:00 _level=ERROR _msg=Cannot connect url http://127.0.0.1:8091 - cause: Get http://127.0.0.1:8091/pools: dial tcp 127.0.0.1:8091: getsockopt: connection refused
134061 _time=2021-04-23T18:39:48.572+00:00 _level=ERROR _msg=Shutting down.
134062 [goport(/opt/couchbase/bin/cbq-engine)] 2021/04/23 18:39:48 child process exited with status 1

Can you please check the above log snippets and suggest, what could have gone wrong?

Thanks,
Ganesh.

Can you collect the full set of logs with the cbopinfo tool?
https://docs.couchbase.com/operator/current/howto-troubleshooting.html#couchbase-server-logs

It will collect other information such as the version too which you’ve not indicated here.

Hello Patrick,
Thanks for the quick response. We run couchbase server 6.0 in CentOS 7 in kubernetes environment. ( docker/Dockerfile at master · couchbase/docker · GitHub)

  1. I have uploaded the logs to Gdrive and provided the link as below, from the failed pod herewith. The pod has the log folder mounted to a persistent volume through a PVC. I am not able to run the utility to collect the logs as you had suggested, as we have restarted the pod already.

Thanks in advance for your time and efforts.

  1. Meanwhile I have a question related to index memory. We are allocating 2GB for indexer memory in our setup. We have observed when the bucket RAM gets full the excess data is pushed to the disk. Similarly if the indexer RAM runs out, will the index-data be pushed to the disk or indexing service will mal-function?

  2. I am not able to get any link which might help in tuning the indexer RAM size. If you could provide any inputs here, its highly appreciated.

Thanks,
ganesh

Can I check, are you using the operator or manually orchestrating Couchbase Server pods on Kubernetes?
If you’re using the operator then you must have an EE licence so please raise a support request so we can prioritise resources to help you out: https://www.couchbase.com/support/working-with-technical-support

I’m afraid I work on the cloud native side so do not know the details of what to configure on the indexer directly. Hopefully someone else can pick that up for you or have a look at the documentation for the version you’re deploying - everything configurable should be documented.

Hi Patrick,

Thanks for the response. We don’t have EE license. We have a home-grown 'couchbase operator ’ pod that brings up the couchbase cluster.

I want to understand more on the 'indexing service’s functionality and its ‘resource handling’ in the community edition. Can you please point me to any relevant material?

Thanks,
Ganesh.

The docs have all the information in so I would start here: Indexing | Couchbase Docs

I would say raise any specific issues as a separate post so it’s more obvious.