Corrupted files preventing me from doing anything

Hi,

I had two virtual cloud machines with couchbase installed on it, and the data being shared between these two machines. They had to be shutdown due to a maintenance. The data was saved and new machines were started using the data.

I logged into one of them, and I tried “/etc/init.d/couchbase-server start” to get Couchbase running again. It gave me this error:

cp: cannot stat ‘/opt/couchbase/var/lib/couchbase/ip’: No such file or directory

So then I tried “sudo apt install couchbase-server”. It recognized I already have an installation (see 3rd screenshot), and tried to update Couchbase to the latest version, but would error out due to not being able to find ‘stats.json’ and ‘stats.json.old’. These files apparently got corrupt.

Same story for the other VM, but ‘stats.json.old’ got corrupted for a different bucket, and ‘stats.json’ is fine for that one.

So yeah, I got 3 corrupted files, 2 on one of the machines and 1 on the other machine. Also tried just removing everything, but these files are even preventing me from doing that.

Does anyone have any idea what I can do?

Couple examples
image

@Poofcakes have you tried to run fsck on that filesystem when both virtual machines unmount it?

I’m not entirely sure how to do that. This is all set-up by my supervisor. The couchbase data with the corrupted files is saved on the mounted drive though, so if I unmount it then it wouldn’t scan the corrupted files right?

You have to unmount filesystem before running fsck. Othewise it will not check and fix the errors. What I would do is to stop all machines, run only one nodes which can use that filesystem, log into the machine, unmount the filesystem with corrupted files and run fsck.

The other way might be to try stop everything and mount filesystem on the host machine, and run fsck there.

Thank you. We managed to clear the broken files by unmounting the drive and running the xfs_repair command on the device. :slight_smile:

Now we have issues with getting the buckets back as they were. The cluster is gone, the nodes are gone, the buckets are gone. The data still exists, but Couchbase doesn’t recognize the data, even if a cluster is made where the data location points straight to the data. Is it possible to salvage this or are we doomed to remake the buckets and re-import everything?

EDIT:
Made a new thread for this issue:

EDIT2: Managed to resolve it by just remaking the buckets. This time it didn’t overwrite and remove all the data, but instead the buckets started slowly filling with the data again.