All indexes dropped after hardware failure

After having disk problems (it looks like), I rebooted the system and found that Couchbase had automatically deleted all the indexes. I found no way to restore only the backed up indexes and had to manually recreate the indexes - this is an extremely strange database behaviour, was it not possible to handle this case in a different way instead of deleting all the indexes?

Web console log:

Cleaup done for index xxx, partition id 0.
Detected storage corruption for index xxx, partition id 0. Starting cleanup.
Repeat and delete all indexes

Indexer log:

[Error] plasmaSlice:NewplasmaSlice Id 0x693ac0 IndexInstId 12332122332123322123 fatal error occured: Unable to initialize xxx.index\mainIndex, err = Bucket/xxx/Mainstore#12332122332123322123:0 : fatal: read failure in recovery: EOF
[Error] plasmaSlice:NewplasmaSlice Id 0 IndexInstId 12332122332123322123 PartitionId 0 fatal error occured: Storage corrupted and unrecoverable
[Info] Bucket/xxx/Backstore#12332122332123322123:0 Plasma: Disable page eviction before reaching quota.
[Error] Indexer:: initPartnInstance storage corruption for indexInst

Indexer::initFromPersistedState Starting cleanup for PartitionId: 0 Endpoints: [:9105]
Indexer::forceCleanupIndexPartition 12332122332123322123 0 mark metadata as deleted
ClustMgr:handleCleanupPartition&{{9876499879879879879 xxx plasma Bucket acb327acb273acb19287acb390acaacb false [field] N1QL SINGLE [false] false false false 0 false 0 {true 0 0 0} 0 0 0 0 0 0 0 0 0 0} 12332122332123322123 0 0 true}

Hi @Poltar ,

I am sorry for the problems you have faced.

This behaviour can be observed when there is a disk corruption. When index service identifies an instance of disk corruption, it will delete the index, so that rest of the indexes can be available.

If the replicas were present in the system, the index metadata would not have been lost due to disk failure (assuming that disk does not fail on the node with the replicas).

To add more details to the problem:
(1) Index service maintains local metadata of all indexes on each node separately.

(2) Multiple indexes can share a single DCP stream from data service to index service to avoid repeated data transfer. If one index gets corrupted, then that index can affect all other non-corrupt index by sending the DCP stream back to zero. To avoid this, indexing service deletes the index metadata along with corrupt data.

In a typical production systems tend to have replicas, so presence of replicas would have avoided such behaviour.

We have opened https://issues.couchbase.com/browse/MB-47544 for further improving this behaviour.

Thanks.