We received “NodeUnavailableException: The node xx.xxx.xx.xx:11210 that the key was mapped to is either down or unreachable. The SDK will continue to try to connect every 1000ms. Until it can connect every operation routed to it will fail with this exception”
We started receiving at 3:00:18.044 PM until 3:02:07.569 PM. Our team was doing maintenance work on the cluster machines but after gracefully shutting it down, adding it back and then rebalancing one-by-one. The bucket is of memcached type and there are 2 machines involved in the cluster.
However, we were expecting the .NET SDK client v2.4.8 to read the cache from the other functioning nodes in this scenario as per https://developer.couchbase.com/documentation/server/3.x/developer/dev-guide-3.0/failover.html
Is there any parameter (SendTimeout, NodeAvailableCheckInterval, IOErrorThreshold etc.) that we can tweak to fix this kind of error?
Any help is appreciated.
With a MemcachedBucket, you’ll have to redistribute the keys via the rebalance and/or read from the source - bypass the cache - since its missing 1/2 the data now (you removed a node). When you get the NodeUnavailableException, the client hasn’t yet caught up with the fact that the cluster has changed and the rebalance has not completed.
"During node failure, Couchbase SDKs receive errors trying to read or write any data that is on a failed node. "
With a Couchbase bucket, you could enable replicas and do a replica read when the error is detected to fetch the key from another server when the app detects this situation.
Thanks for the quick response. So that means, in case of Memcached, SDK will keep receiving NodeUnavailableException until the rebalance is complete. Is this understanding correct?
I was hoping to handle this exception by receiving list of updated available nodes / or (all nodes - minus the one that failed) and forwarding the request to them. Is it possible with .NET SDK?