Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Blocker
-
Resolution: Fixed
-
Affects Version/s: 1.8.0
-
Component/s: ns_server
-
Security Level: Public
-
Labels:
Description
this issue was reported by one of the users where autofailover was triggered twice instead of once on a cluster.
the root cause is still under investigation by aliaksey
the root cause is still under investigation by aliaksey
* node .238 that's master and runs autofailover service starts to have some networking issues and is split from rest of cluster
* cluster elects .240 as new master
* autofailover on new master fails over .238
* network problems on .238 are somehow resolved and connection to rest of cluster is restored
* now cluster has 2 masters briefly: .238 and .240. .240 surrenders mastership to .238
* now .238 is the only master and things are fine, except it's autofailover service is not aware of automatic failover that happened when .238 was disconnected
* when some other node has problems .238 fails it over automatically
So the fix is to make sure autofailover service is always using latest autofailover count that's stored in config.