Yesterday one node (node-3) on our 4-node-cluster was failed over because of full disk / (couchbase directory was not full). Or it crashed and filled then / with a crush dump, not sure. After that the remaining 3 nodes are in a unstable state. Rebalance not possible, views are not updated nor visible in the gui (on one buckets views are shown, on the second bucket gui waits and ends then with “connection timeout”).
From the Logs on node 1:
memcached<0.88.0>: Tue Dec 20 07:46:44.041605 CET 3: (community) DCP (Producer) eq_dcpq:mapreduce_view: community _design/Person (prod/replica) - (vb 87) Stream request failed because this vbucket is in backfill state
Tue Dec 20 07:47:15.945104 CET 3: (community) DCP (Producer) eq_dcpq:mapreduce_view: community _design/Person (prod/replica) - (vb 87) Stream request failed because this vbucket is in backfill state
[couchdb:info,2016-12-20T7:47:56.927,ns_1@xx:<0.28354.158>:couch_log:info:41]dcp client (<0.25310.0>): Temporary failure on stream request on partition 87. Retrying…
All this messages appears every few seconds.
Same on node-2 with vbucket 131
Same on node-4 with vbucket 78
Any Ideas how to fix?
Update: Also with help from the enterprise support we could not repair the cluster. Looks like we ran into a very rare issue with replicas. We were setup then a second cluster, copied everything with XDCR to the new cluster without any problems and switched. So at least we could fix that without any data loss.