[MB-4906] autofailover may failover two nodes automatically within 1 minute if the master node is failed over and the old master nodes is elected as the master again Created: 17/Mar/12 Updated: 09/Jan/13 Resolved: 30/Mar/12 |
|
| Status: | Closed |
| Project: | Couchbase Server |
| Component/s: | ns_server |
| Affects Version/s: | 1.8.0 |
| Fix Version/s: | 1.8.1, 1.8.2 |
| Security Level: | Public |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Farshid Ghods | Assignee: | Aliaksey Artamonau |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | 1.8.1-release-notes | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
this issue was reported by one of the users where autofailover was triggered twice instead of once on a cluster.
the root cause is still under investigation by aliaksey |
| Comments |
| Comment by Aleksey Kondratenko [ 17/Mar/12 ] |
|
So this is what happened.
* node .238 that's master and runs autofailover service starts to have some networking issues and is split from rest of cluster * cluster elects .240 as new master * autofailover on new master fails over .238 * network problems on .238 are somehow resolved and connection to rest of cluster is restored * now cluster has 2 masters briefly: .238 and .240. .240 surrenders mastership to .238 * now .238 is the only master and things are fine, except it's autofailover service is not aware of automatic failover that happened when .238 was disconnected * when some other node has problems .238 fails it over automatically So the fix is to make sure autofailover service is always using latest autofailover count that's stored in config. |
| Comment by Aleksey Kondratenko [ 30/Mar/12 ] |
| Done http://review.couchbase.org/14411 |
| Comment by Aleksey Kondratenko [ 05/Apr/12 ] |
| somehow we didn't do it for 1.8.1 but for 1.8.2 instead. Will backport |
| Comment by Thuan Nguyen [ 05/Apr/12 ] |
|
Integrated in github-ns-server-2-0 #328 (See [http://qa.hq.northscale.net/job/github-ns-server-2-0/328/]) bp: Result = SUCCESS Aliaksey Kandratsenka : Files : * src/auto_failover.erl Aliaksey Kandratsenka : Files : * src/auto_failover.erl |