Membase Server 1.6.5.3 unable start after join cluster
Hi,
I was trying to rolling upgrade our current membase server from v1.6.5.0 to v1.6.5.3. My server is Windows Server 2008 R2 Standard (x64). When the Membase Server v1.6.5.3 has been installed and join cluster. There is a error message shown in the Membase Console:
Attention - Join completion call failed. Failed to start ns_server cluster processes back. Logs might have more details.
After few seconds, the server was crashed and the Membase Console can not be displayed. The the MembaseServer was unable to restart successfully.
Here are 2 log messages in my Event Log Viewer that source from ErlSrv:
-------------------
1. MembaseServer: Using TerminateProcess to kill erlang.
2. MembaseServer: Erlang machine stopped instantly (distribution name conflict?). The service is not restarted, ignoring OnFail option.
-------------------
I was tried to uninstall Membase Server v1.6.5.3 and reinstalll the v1.6.5.0. The Membase Server v1.6.5.0 can join cluster successfully. Is that means these two version incompatible?
This server CAN or CANNOT join the cluster? Those two versions should be compatible...
Perry
Will, did you follow the upgrade instructions explicitly described here: http://techzone.couchbase.com/wiki/display/membase/Membase+Server+1.6.5.3?
Hi Perry,
I do followed the instructions to upgrade my servers. Step by step.
I was having two nodes orginally:
Node1: Membase Server v1.6.5.0
Node2: Membase Server v1.6.5.0
I'm going to upgrade my Node2 first.
Removing Node2 from Node1.
Reinstall Node2 to Membase Server v1.6.5.3
Then join Node2 to Node1 was just failed!!
I think in the Node1 should contains some information about my node2 that causing Node2 unable to joining to Node1 successfully.
Do you have any idea?
Hi Perry,
I'm pretty sure that Membase Server v1.6.5.3 is incompatible with v1.6.5.0 and join one node to antother will cause another node crash and unable to start server again!
I was tested many times. Here is what I tested.
Node1: Membase Server v1.6.5.3
Node2: Membase Server v1.6.5.0
I was tried join Node1 to Node2, which means I add Node1 server into Node2, then Node1 will dead forever.
I was also tried join Node2 to Node1, which means I add Node2 server into Node1, then Node2 will dead forever.
Here is the log shown in the server that join to:
Failed to add node 172.16.1.1:8091 to cluster. Join completion call failed. Failed to start ns_server cluster processes back. Logs might have more details.
I was tested on 4 servers for many times. There are all the same result. Could you take a try?
By the way, I'm using Windows Server 2008 R2 x64.
Yes, sorry about that Will. We discovered this bug just recently and will be sending out a patch shortly.
My apologies.
Perry
Hey Will, this has been resolved and you can find more information about applying the patch at: http://techzone.couchbase.com/issues/browse/MB-3554
Perry
Confirmed. Thanks.
Would you be concerned with exchanging hyperlinks?
Sell my diamond ring increate Avg free virus protection cyanogenetic
After I deleted the "C:\Program Files\Membase\Server\config\ns_1\config.dat" file. The MembaseServer server was able to started.
I was originally have two nodes in my cluster. The node2 is removed before I do the rolling upgrade. So the Membase Console in my node1 doesn't appear node2 on the MANAGE/Servers list page.
I was saw some weird log in the MONITOR/Log page shown below:
----------------------------------------------------------------
Failed to add node 172.16.1.2:8091 to cluster. Prepare join failed. Error etimedout happened during REST call post to http://172.16.1.2:8091/engageCluster2.
Node 'ns_1@172.16.1.1' saw that node 'ns_1@172.16.1.2' went down.
Node 'ns_1@172.16.1.1' saw that node 'ns_1@172.16.1.2' came up.
Failed to add node 172.16.1.2:8091 to cluster. Join completion call failed. Failed to start ns_server cluster processes back. Logs might have more details.
Failed to add node 172.16.1.2:8091 to cluster. Prepare join failed. Got HTTP status 500 from REST call post to http://172.16.1.2:8091/engageCluster2. Body was: "[\"Unexpected server error, request logged.\"]"
Node 'ns_1@172.16.1.1' saw that node 'ns_1@172.16.1.2' went down.
...
...
...
----------------------------------------------------------------
How can I solve this problem? Thank you.