Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Blocker
-
Resolution: Fixed
-
Affects Version/s: 1.8.0
-
Component/s: ns_server
-
Security Level: Public
-
Labels:
Description
this issue was reported by one of the users where autofailover was triggered twice instead of once on a cluster.
the root cause is still under investigation by aliaksey
the root cause is still under investigation by aliaksey
Activity
Dipti Borkar
made changes -
| Field | Original Value | New Value |
|---|---|---|
| Assignee | Aleksey Kondratenko [ alkondratenko ] | |
| Priority | Major [ 3 ] | Critical [ 2 ] |
Dipti Borkar
made changes -
| Priority | Critical [ 2 ] | Blocker [ 1 ] |
Dipti Borkar
made changes -
| Fix Version/s | 1.8.1 [ 10295 ] |
Farshid Ghods
made changes -
| Summary | autofailover may failover two nodes automatically within 1 minute if the master node is failed over and the old master nodes is elected as the master agaij | autofailover may failover two nodes automatically within 1 minute if the master node is failed over and the old master nodes is elected as the master again |
Aleksey Kondratenko
made changes -
| Status | Open [ 1 ] | Resolved [ 5 ] |
| Assignee | Aleksey Kondratenko [ alkondratenko ] | Aliaksey Artamonau [ Aliaksey Artamonau ] |
| Resolution | Fixed [ 1 ] |
Farshid Ghods
made changes -
| Labels | 1.8.1-release-notes |
Farshid Ghods
made changes -
| Status | Resolved [ 5 ] | Closed [ 6 ] |
* node .238 that's master and runs autofailover service starts to have some networking issues and is split from rest of cluster
* cluster elects .240 as new master
* autofailover on new master fails over .238
* network problems on .238 are somehow resolved and connection to rest of cluster is restored
* now cluster has 2 masters briefly: .238 and .240. .240 surrenders mastership to .238
* now .238 is the only master and things are fine, except it's autofailover service is not aware of automatic failover that happened when .238 was disconnected
* when some other node has problems .238 fails it over automatically
So the fix is to make sure autofailover service is always using latest autofailover count that's stored in config.