Index mutation keeps getting stuck

pc · June 4, 2021, 10:51am

In our test environment we keep seeing index mutations becoming stuck, particularly on one particular bucket (ticket_bucket) but as you can see it’s also starting to occur on other buckets:

It’s a single node, couchbase 6.5.1 - autonomous operator running on an Azure kubernetes cluster.

Relevant logs:
logs.zip (3.9 MB)

This is a quiet, low volume system so it’s not a performance issue. It’s the second time we’ve seen it (the first time we saw it, we blew the environment away and started again). Any help will be greatly appreciated.

I deleted the ticket bucket primary index and tried to recreate but got this:
“GSI CreatePrimaryIndex() - cause: Create index or Alter replica cannot proceed due to rebalance in progress, another concurrent create index request, network partition, node failover, indexer failure, or presence of duplicate index name.”

deepkaran.salooja · June 11, 2021, 6:43pm

@pc , thanks for sharing the logs. It looks like one of the projectors(the process which forwards the mutations from data service to index service) is stuck and not responding. You can kill the projector process to unblock. Also, if you can share the projector log file, we can check what the issue is with the projector.

2021-06-04T10:25:41.221+00:00 [Warn] Slow/Hung Operation: KVSender::sendMutationTopicRequest did not respond for 68h36m33.258388772s for projector cb-0000.cb.test-tdm.svc:9
999 topic MAINT_STREAM_TOPIC_f55e55d41c45ea1ff7ff9824dad0b3f2 bucket location_bucket
2021-06-04T10:25:41.221+00:00 [Warn] Slow/Hung Operation: KVSender::sendMutationTopicRequest did not respond for 68h36m32.640886572s for projector cb-0000.cb.test-tdm.svc:9
999 topic MAINT_STREAM_TOPIC_f55e55d41c45ea1ff7ff9824dad0b3f2 bucket prediction_bucket
2021-06-04T10:25:42.221+00:00 [Warn] Slow/Hung Operation: KVSender::sendMutationTopicRequest did not respond for 68h36m33.047651308s for projector cb-0000.cb.test-tdm.svc:9
999 topic MAINT_STREAM_TOPIC_f55e55d41c45ea1ff7ff9824dad0b3f2 bucket weather_bucket

pc · June 15, 2021, 8:45am

Thanks for your response.

I’ve killed the projector a couple of times and left it for about 10 mins but it hasn’t made any difference on the outstanding index mutations.

The projector log which includes one of the projector restarts is here:
projector.zip (1.3 MB)

If i kill the whole kubernetes pod and wait for it to come back, it processes all mutations before then becoming stuck again.

deepkaran.salooja · June 15, 2021, 9:34pm

@pc, after the restart I don’t see pending mutations from projector logs:

2021-06-15T08:38:46.208+00:00 [Info] KVDT[<-ticket_bucket<-127.0.0.1:8091 #MAINT_STREAM_TOPIC_f55e55d41c45ea1ff7ff9824dad0b3f2] ##1d stats: {"numDocsPending":0}

If you can capture both the projector and indexer log file when you see the pending mutations on the UI, then we can corelate and try to figure out the problem.

pc · June 17, 2021, 9:10am

logs.zip (2.7 MB)

Here are the logs for both projector and indexer. I killed the projector at 8:57 ish.

Mutations in the UI:

varun.velamuri · June 17, 2021, 10:42am

@pc ,

It appears that you have ended up in the bug Loading.... From the projector logs, control channel is full. This can happen when there are more than 10 buckets in the cluster

2021-06-17T08:58:25.310+00:00 [Warn] FEED[<=>MAINT_STREAM_TOPIC_f55e55d41c45ea1ff7ff9824dad0b3f2(127.0.0.1:8091)] ##15 control channel has 10000 messages

This issue has been fixed in 6.6.0 and later versions. As a work-around, please change the setting “projector.backChanSize” to 50000. E.g.,

curl -u : http://<indexer_ip>:9102/settings -X POST -d ‘{“projector.backChanSize”:50000}’

After the setting change, please restart projector. After restart, you should see the below message in projector logs

settings projector.backChanSize will updated to 50000

Thanks,
Varun

pc · June 17, 2021, 3:26pm

That did the trick! Thanks.

Will this setting persist? and if i were to scale out my couchbase deployment, would the other servers also have this new setting?

varun.velamuri · June 22, 2021, 5:08am

@pc,

Apologies for the delayed reply. I somehow missed this notification.

Will this setting persist?

Yes, this setting would persist.

if i were to scale out my couchbase deployment, would the other servers also have this new setting?

Yes. New servers would have this new setting

Thanks,
Varun

Topic		Replies	Views
Index mutation stuck Couchbase Server	1	584	July 20, 2020
Index mutations remaining not going down Couchbase Server	3	318	May 28, 2024
Projector question Couchbase Server connections , java , server , index	1	1126	September 23, 2022
Indexes suddenly getting hundred of thousands of mutations and increasing Couchbase Server	7	2298	October 1, 2021
Indexer In Warmup State. Please retry the request later. from <IP Address>:9101 Couchbase Server	6	2911	February 28, 2019

Index mutation keeps getting stuck

Related topics