All indexes dropped after hardware failure

Poltar · July 21, 2021, 12:15am

After having disk problems (it looks like), I rebooted the system and found that Couchbase had automatically deleted all the indexes. I found no way to restore only the backed up indexes and had to manually recreate the indexes - this is an extremely strange database behaviour, was it not possible to handle this case in a different way instead of deleting all the indexes?

Web console log:

Cleaup done for index xxx, partition id 0.
Detected storage corruption for index xxx, partition id 0. Starting cleanup.
Repeat and delete all indexes

Indexer log:

[Error] plasmaSlice:NewplasmaSlice Id 0x693ac0 IndexInstId 12332122332123322123 fatal error occured: Unable to initialize xxx.index\mainIndex, err = Bucket/xxx/Mainstore#12332122332123322123:0 : fatal: read failure in recovery: EOF
[Error] plasmaSlice:NewplasmaSlice Id 0 IndexInstId 12332122332123322123 PartitionId 0 fatal error occured: Storage corrupted and unrecoverable
[Info] Bucket/xxx/Backstore#12332122332123322123:0 Plasma: Disable page eviction before reaching quota.
[Error] Indexer:: initPartnInstance storage corruption for indexInst
…
Indexer::initFromPersistedState Starting cleanup for PartitionId: 0 Endpoints: [:9105]
Indexer::forceCleanupIndexPartition 12332122332123322123 0 mark metadata as deleted
ClustMgr:handleCleanupPartition&{{9876499879879879879 xxx plasma Bucket acb327acb273acb19287acb390acaacb false [field] N1QL SINGLE [false] false false false 0 false 0 {true 0 0 0} 0 0 0 0 0 0 0 0 0 0} 12332122332123322123 0 0 true}
…

amit.kulkarni · July 22, 2021, 8:00am

Hi @Poltar ,

I am sorry for the problems you have faced.

This behaviour can be observed when there is a disk corruption. When index service identifies an instance of disk corruption, it will delete the index, so that rest of the indexes can be available.

If the replicas were present in the system, the index metadata would not have been lost due to disk failure (assuming that disk does not fail on the node with the replicas).

To add more details to the problem:
(1) Index service maintains local metadata of all indexes on each node separately.

(2) Multiple indexes can share a single DCP stream from data service to index service to avoid repeated data transfer. If one index gets corrupted, then that index can affect all other non-corrupt index by sending the DCP stream back to zero. To avoid this, indexing service deletes the index metadata along with corrupt data.

In a typical production systems tend to have replicas, so presence of replicas would have avoided such behaviour.

We have opened Loading... for further improving this behaviour.

Thanks.

Topic		Replies	Views
Lost indexes spontaneously Couchbase Server	10	1615	June 24, 2021
Indexes fail and must be recreated SQL++ index	6	1926	January 16, 2019
Indexer on one server continuously crashing Couchbase Server server , index	5	3107	March 22, 2017
Indices are malformed after service stopped Couchbase Server n1ql , index	7	1245	November 4, 2019
Full disk - lost everything! Couchbase Server	7	1659	April 18, 2019

All indexes dropped after hardware failure

Related topics