Thanks Dribgy! We were not actively processing any data during the failure simulation, but that is something I want to understand in more depth at a later date. My goal with this simulation is to better understand the true fault-tolerance of Couchbase.
“***There are not replica (backup) copies of all data on this node!***” It’s tough to read-between-the-lines on that statement to me that is a statement of fact where the database system is saying we are definitely going to lose data. I understand erroring on the side of caution, but that statement immediately kills any credibility/trust in the fault-tolerance.
Is there any way to increase the fault-tolerance to where the database can be sure that we are -not- losing any data that’s persisted to disk? Would having a third node help it be able to determine if sufficient replicas are available? I understand that we may lose data for active writes in the write cache, but that’s an understandable risk with any application that caches writes.
We are assuming that writes in cache are immediately persisted to disk as soon as the disk resources are available. Meaning if the system has low load, small writes would be immediately persisted to disk. Is that correct, or do we need to programatically confirm that writes were persisted to disk?