Indexer service looping between Warmup and Ready state, cannot run queries after VM restart

gmaggini · May 8, 2020, 11:17am

Hi everyone, after a hard shutdown of the VM (power loss) running Couchbase server, the queries on Couchbase cluster (single-node) do not run as the Indexer service continuously loops between “Warming Up” and “Ready” state.
It may be a reason of indexes having some problems in that situation.

Question: apart from enhancing the reliability of the VMs so they do not get turned off abruptly, and eventually adding other nodes, what other approach do you recommend for enhancing the reliability of indexes? I would expect Couchbase to be reliable in these kind of scenarios, although rare, hence perhaps you have recommendation on best configuration to use.

We use Couchbase Community 5.1.1.

Thank you !

varun.velamuri · May 8, 2020, 1:44pm

@gmaggini,

It is not expected of the indexer process to loop between “Warmup” and “Ready” state. Usually, it comes back to “Ready” state and stays there unless there is a crash. Can you kindly share the indexer log so that we can try to root cause the issue.

You can find the indexer logs at /opt/couchbase/var/lib/couchbase/logs/

Having index replicas would ease your problem. For a query, If indexes on one node are not available, then it will be picked up from the other available nodes. Please refer to this blog article: Couchbase Index Replicas | Drop Indexes | Couchbase for more details on index replicas.

Thanks,
Varun

varun.velamuri · May 11, 2020, 8:57am

@gmaggini,

On couchbase community edition, index replicas are not supported. You can create equivalent indexes on other index nodes to have better index availability.

Thanks,
Varun

gmaggini · May 20, 2020, 12:22am

@varun.velamuri thank you for your answer!
Here is additional information after research.

When the problem describes happens (indexes moving between Ready and Warm Up), there are 2 categories of errors in indexer.log.

The first one (repeating) is:

2020-05-13T10:41:00.514-07:00 [Warn] Indexer::validateIndexInstMap Bucket mybucket is not yet ready (err = MCResponse status=KEY_ENOENT, opcode=0x89, opaque=0, msg: ) Retrying(5)..
2020-05-13T10:41:00.713-07:00 [Error] StartDcpFeedOver(): MCResponse status=KEY_ENOENT, opcode=0x89, opaque=0, msg:

the second type of error (also repeating) is:

'd:\Couchbase\var\lib\couchbase\data\@2i\business_idx_ic_terminal_status_10511548744874879119_0.index\data.fdb.188', errno = 32: 'The process cannot access the file because it is being used by another process.'2020-05-13T10:47:05.722-07:00 [ERRO][FDB] Successfully used partially compacted file 'd:\Couchbase\var\lib\couchbase\data\@2i\business_idx_ic_terminal_status_10511548744874879119_0.index\data.fdb.189' for recovery replacing old file d:\Couchbase\var\lib\couchbase\data\@2i\business_idx_ic_terminal_status_10511548744874879119_0.index\data.fdb.188.

Observations:

System has 16 GB of RAM and sufficient disk space
Data service 6 GB
Index service 2 GB
Full-text service 4 GB
2 GB for Windows
No sign of performance degradation other than the error discussed here
5 buckets were configured
buckets are usually not very large (typically < 10.000 documents), only 1 bucket has 700.000 documents * It has 1 large GSI secondary index of 700 MB and many others of 20 MB each
GSI indexing is set to “circular write”, compaction set every day 00:00 (default)

We are now currently looking into dropping the large secondary index, but not sure whether this could be the problem.
We are also considering increasing the memory for Indexer service, but we could not find any sizing recommendations for Index service (only for data service is well explained here: Sizing Guidelines | Couchbase Docs)

Do you have any recommendation? And what could be causing the error reported, from your experience?

thank you

sduvuru · May 20, 2020, 11:34pm

The log message
Successfully used partially compacted file ‘d:\Couchbase\var\lib\couchbase\data@2i\business_idx_ic_terminal_status_10511548744874879119_0.index\data.fdb.189’ for recovery replacing old file d:\Couchbase\var\lib\couchbase\data@2i\business_idx_ic_terminal_status_10511548744874879119_0.index\data.fdb.188.
indicates there is a successfully compacted file that was created and before switching over to that file there was an issue and recovery had to be done. When recovery opened the last forestdb file it found the successfully compacted file and switched over to it. So, the compacted file will be used as the current file. But it looks like the open is getting called again and again and I am not sure why that is happening.

varun.velamuri · May 21, 2020, 7:29am

@gmaggini,

I think we need the complete indexer logs to analyze the sequence of events to debug this issue further.

Also, if you have multiple nodes in the cluster, you can try to failover this node and rebalance-in again into the cluster. This would get the indexer out of this situation. Note that the existing indexes on this node would be dropped when failover and rebalance-in happens. So, you will have to create the indexes again.

Thanks,
Varun

gmaggini · May 26, 2020, 12:41pm

@varun.velamuri thank you. Is it possible to send you the log files privately instead of attaching them?
Regarding your note: we do have a single node per cluster.

@sduvuru do you think this problem may be somehow linked to this commit? (coincidentally, by you! )

github.com/couchbase/forestdb

MB-35666: Forestdb: Renaming a compacted file can lead to the file

committed 10:53PM - 27 Aug 19 UTC

sduvuru

+339 -5

getting deleted When a file is compacted the older file is deleted when there a…re no references to it. This causes an issue when the compacted file is renamed to the original file and then backed up. When the backed up file is opened, the original file gets deleted as the name of the previous file for the compacted file matches the renamed file. The fix is to not remove the old file if it is in a different dirrectory. Change-Id: I7d3fc397768483976567774f79c49882847aba18 Reviewed-on: http://review.couchbase.org/113762 Tested-by: Build Bot <build@couchbase.com> Reviewed-by: Srinath Duvuru <srinath.duvuru@couchbase.com>

As this commit is from August 2019, for sure it’s not included in CB Community 5.1.1, right? In which version would it be included?

Thank you!

varun.velamuri · May 28, 2020, 3:59am

@gmaggini,

You can send a mail to varun [dot] velamuri [at] couchbase [dot] com, attaching the log files . If the file size is too large, please let me know. I will create.a google drive link and share it with you.

Thanks,
Varun

Topic		Replies	Views
Indexer warming up forever Couchbase Server	7	2525	June 8, 2019
Indexer In Warmup State. Please retry the request later. from 127.0.0.1:9101 Couchbase Server query	5	3824	July 1, 2017
Indexer In Warmup State. Please retry the request later. from <IP Address>:9101 Couchbase Server	6	2929	February 28, 2019
Indexes fail and must be recreated SQL++ index	6	1969	January 16, 2019
Indexer restarts Couchbase Server	0	1488	December 23, 2016

Indexer service looping between Warmup and Ready state, cannot run queries after VM restart

Related topics