Nodes continually marked down and up after upgrading to 1.7.1.1
For the past couple of weeks, nodes in our cluster have been marking each other down and then back up. It happens once or twice per hour. There doesn't seem to be an exact pattern.
Here's what our log looks like
--------------------------------
Event Module Code Server Node Time
Node 'ns_1@192.168.1.128' saw that node 'ns_1@192.168.1.126' came up. ns_node_disco004 ns_1@192.168.1.128 09:39:27 - Mon Oct 31, 2011
Node 'ns_1@192.168.1.126' saw that node 'ns_1@192.168.1.128' came up. ns_node_disco004 ns_1@192.168.1.126 09:39:27 - Mon Oct 31, 2011
Node 'ns_1@192.168.1.128' saw that node 'ns_1@192.168.1.126' went down. ns_node_disco005 ns_1@192.168.1.128 09:39:26 - Mon Oct 31, 2011
Node 'ns_1@192.168.1.126' saw that node 'ns_1@192.168.1.128' went down. ns_node_disco005 ns_1@192.168.1.126 09:39:26 - Mon Oct 31, 2011
Node 'ns_1@192.168.1.125' saw that node 'ns_1@192.168.1.128' came up. ns_node_disco004 ns_1@192.168.1.125 08:26:29 - Mon Oct 31, 2011
Node 'ns_1@192.168.1.128' saw that node 'ns_1@192.168.1.125' came up. ns_node_disco004 ns_1@192.168.1.128 08:26:29 - Mon Oct 31, 2011
Node 'ns_1@192.168.1.125' saw that node 'ns_1@192.168.1.128' went down. ns_node_disco005 ns_1@192.168.1.125 08:26:17 - Mon Oct 31, 2011
Node 'ns_1@192.168.1.128' saw that node 'ns_1@192.168.1.125' went down. ns_node_disco005 ns_1@192.168.1.128 08:26:16 - Mon Oct 31, 201
--------------------
This issue started after we upgrading from v1.6.5 to v1.7.1.1
This cluster is running on four Windows Server 2008 R2 Enterprise machines. They are up to date on SPs and hotfixes. The machines have not been rebooted in a couple months.
We are considering going back to v1.6.5
Ideas? Thoughts? Questions?
Any help would be greatly appreciated!
-Kevin