MembaseServer won't start unless other cluster instance is running
I installed MembaseServer on two machines (1.0.3). Both machines are running Win7 x64. On one machine, I installed the x64 MembaseServer (Instance A), and on the other I installed the x86 (Instance B). I joined Instance B to Instance A as a cluster.
When the machine with Instance A, I cannot start up Instance B. I get the error: "MembaseServer: Erlang machine stopped instantly (distribution name conflict?). The service is not restarted, ignoring OnFail option."
When I start up Instance A, I am then able to successfully start up Instance B.
I suspect this doesn't have anything to do with x64 vs x86, but that by joining Instance B to Instance A, it is somehow now dependent on Instance A being up before it can start. After joining the cluster, I would have expected these instances to start up successfully independently. If I'm in Production and Instance A is down for an extended period, I would hope that the cluster would still run with just Instance B.
I then removed Instance B from the cluster from the console on Instance A. I still can't start up Instance B independently. Maybe there's some kind of cluster configuration leftover that I have not been able to find.
Thanks for the update Bernard. That is certainly not the designed behavior...any member of a cluster should be able to act on its own when another member is down.
If you can reproduce the issue, I would certainly like to engage with you and get the log output or steps that you used so I can see about diagnosing and fixing it.
Perry
If you can reproduce the issue, I would certainly like to engage with you and get the log output or steps that you used so I can see about diagnosing and fixing it.
Perry
I created a log of my attempts to bring up Instance B while Instance A was not running. Would you like me to email it to you?
Sure Bernard, email it to perry -at- northscale -dot- com and I'll take a look.
Thanks!
Thanks!
It's on its way. Thanks for your help.
Thanks Bernard, there's definitely a bug here. I've filed it and we will evaluate it for an upcoming release.
The issue is that our current code tries to ping the IP address of the other node in the cluster and won't start if it can't ping that address. In fact, if the machine is alive but the memcached service is down, the other node will continue to start just fine.
Hope this doesn't impact your ability to use our server too much...
Please let me know if there's anything else I can do for you.
Perry
Also Bernard, the issue here should only be around starting the service up. If you have a cluster of 2 nodes running, and one goes down, the other should be able to continue servicing requests.
Please let me know if that is not the case as that would be different bug that needs investigating.
Thanks.
Perry
After removing Instance B from the cluster, I was having trouble starting it independently still. I ended up deleting config.dat and it generated a new one when I restarted the service.
I'm still concerned that one instance is dependent on another to even start. It doesn't match what I would want from a cluster where any particular instance could be down and the rest would still be able to start up.