In versions lower than 7.1, autofailover of index service was not supported. But due to Data Service preference index service would still failover (automatically). Can someone help understand the difference between the “autofailover of index service in lower versions which would take place due to Data Service preference” and the “autofailover introduced starting version 7.1”?
a. The cluster manager checks the index service health every few seconds and if it is not healthy, the node will be automatically failed over.
b. Also, the index service tries to ensure index availability after auto-failover e.g. if an index only exists on node A and has no replica, failing over node A will lead to complete unavailablity of that index. Auto-failover is not allowed in such a case. However, the Data Service preference takes precedence over this.
b. is clear, but in a. I cannot make out the difference between 6.6 with DSP and 7.1. I mean, though cluster mgr will not check index service health in 6.6 with DSP, ultimately the index service will failover in both.
Also, wrt example in b. if the index service is down, anyways the index is not accessible, by not doing autofailover, it isn’t any better. It does not ensure index availability anyways. Atleast if autofailover would have happened, clustermap would have got correctly updated and requests would not have hit the failed node, isnt it?
Lower server versions e.g. 6.6 had auto-failover disabled to avoid the possibility of false failover due to CPU saturation. 7.1 has automatic CPU throttling built-in for indexer process once it exceeds a high threshold. This ensures that cluster manager component running on the indexer node can communicate with master node and avoid node failover.
Also, 7.1 is more advanced where index service can signal it is unhealthy vs cluster manager being unable to reach the node. This allows for more capabilities to be added in the future e.g. if indexer detects a bad disk due to disk writes failing over a threshold, it can signal unhealthy.
The issue with that is index metadata is local to a node. If the last replica of an index is lost, it cannot be repaired by the system on further capacity addition and rebalance due to metadata loss.
For index service, all index replicas are considered active and query requests are automatically load-balanced across those. If a node is unhealthy, the queries will automatically use the other replica. There is no dependency on cluster map update as such.