In Couchbase 7.0.0 Beta, indexes may stuck in building 100% state

We are adaptively creating indexes for new collections based on JVM type specifications.

We then check whether an index gets online or not. We noticed that indexes can get stuck indefinitely in “building 100%” state.

We have a total of 2 scopes and around 60 indexes total in 1 bucket, 150,000 documents, 2 nodes, RAM quota is not used up for data or index.

@zoltan.zvara,

This could be a bug in indexer code. Can you share all the indexer logs from /opt/couchbase/var/lib/couchbase/logs directory

Thanks,
Varun

@varun.velamuri thanks for helping out, are these logs contain sensitive data? Is there any way to provide your this data in private?

Our CI flushed the bucket, I will be on the lookout for these stuck building states and if I find any I will make an attempt to extract the logs.

@zoltan.zvara,

Apologies for the delay. You can collect the logs by enabling log redaction that will filter out all sensitive information. Please refer to https://docs.couchbase.com/server/7.0/manage/manage-logging/manage-logging.html for more information.

You can DM me the link to logs in the forums.

Thanks,
Varun

2 Likes

@varun.velamuri thanks, let me get back to you with the logs , it may take a while!

Indexes may be stuck in a building state as follows:

While the scope G7aa0vxt also got deleted meanwhile. These indexes probably got created in parallel and got duplicated.

In this case, the indexing service stops working and the cluster requires a restart since all queries start to fail. (After restarting the cluster, the indexes with the non-existent scope are gone.)

@zoltan.zvara Would be great if you can share the logs requested by Varun?

@jeelan.poola I have already shared the logs with Varun and I got a confirmation that it was downloaded and opened successfully.

Let me know if you need anything else.

@jeelan.poola @zoltan.zvara, Apologies for the confusion. I have the logs.

@zoltan.zvara,

Apologies for the long delay. The index build has been stuck due to MB-43117 - which has been fixed after the beta release. The workaround in this case is to restart indexer.

Thanks,
Varun

1 Like

@varun.velamuri thansk, do you plan to publish couchbase-server-7.0.0-4131 or later to Docker repository?

@zoltan.zvara, We will soon be having a beta-refresh which will have this issue fixed. Now, I don’t know the exact dates as when this release can happen. Will let you know once I have any official update on this.

Thanks,
Varun

1 Like

Thanks @varun.velamuri, we are waiting for it eagerly. None of our tests go through recently as our Beta 7.0.0 development 4-node server periodically crashes the indexer service at some point.

Service 'indexer' exited with status 134. Restarting. Messages:
2021-01-26T23:04:03.266+00:00 [Info] ForestDBSlice::Commit SliceId 0 IndexInstId 7732669426224168914 FlushTime 2.498µs CommitTime 439.235µs TotalFlushTime 160.121257ms TotalCommitTime 855.418814ms
2021-01-26T23:04:03.267+00:00 [Info] ForestDBSlice::OpenSnapshot SliceId 0 IndexInstId 7732669426224168914 Creating New Snapshot SnapshotInfo: seqnos: 165468, 165468, 37952 committed:true
2021-01-26T23:04:03.267+00:00 [Info] StorageMgr::handleCreateSnapshot Added New Snapshot Index: 7732669426224168914 PartitionId: 0 SliceId: 0 Crc64: 1258414697894122896 (SnapshotInfo: seqnos: 165468, 165468, 37952 committed:true) SnapCreateDur 88.879701ms SnapOpenDur 61.189µs
2021-01-26T23:04:03.385+00:00 [Info] ServiceMgr::rebalanceJanitor Running Periodic Cleanup
assertion failed [hbmeta.prefix != NULL] at /home/couchbase/jenkins/workspace/couchbase-server-unix/forestdb/src/hbtrie.cc:1267 ((nil) != 0x7f7748c1dbd0)
Breakpad caught a crash in forestdb. Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/55d37586-49cc-42f7-63fbbbac-c4a9e907.dmp before terminating.

There are more logs and definitely more details to this, but I want to know if it worth reporting in detail, or should we wait for the new build to get released?

@zoltan.zvara, I think this crash is a new issue. We have seen this in very early versions (4.5.0) but not after that

@sduvuru Can you kindly let us know what information is required to debug this issue.

Thanks,
Varun

Looks like there is a corruption. A prefix is expected at this level and it is NULL. We have not seen this issue in our testing. If there is a repro, can you please share and we can try to debug it. You can try forestdb_dump utility to see if the on disk image of the file is corrupted.

OK, we will make an attempt to reproduce it then send you the dumps immediately.

Thank You, varun!
Have you already fixed that crash?

@MirandaJO89 ,

We have already fixed the build stuck issue. Reg. the crash, we have not observed it in our internal testing. If you are seeing the same crash at your end and if you have a repro, can you please share the steps to reproduce the issue so that we can try to debug it.

Also, like @sduvuru mentioned, you can try the forestdb_dump utility to check if the on-disk image file is corrupted.

Thanks,
Varun

We are on the lookout for this issue to come up again! Let me get with the report back to you when it does!