MembaseServer Won't Start
We have installed Membase on a development Windows 2003 server for prototyping. We wanted to see what would happen if you just stopped the service on one of the servers in a cluster so I did. Now, when I try to start the service back up, I receive the following errors in the event log:
MembaseServer: Erlang machine stopped instantly (distribution name conflict?). The service is not restarted, ignoring OnFail option.
then
The MembaseServer service terminated with the following error:
The process terminated unexpectedly.
Any ideas as to why this would be happening?
Hi,
Looking at the code, this would seem to indicate the expected processes are starting initially, but then fail to continue running. Is there, by chance, any unusual way the path to the installation directory has been set up?
I'll send you a private message with some contact info and some details on gathering some diagnostic info.
Thanks,
Matt
Did this get resolved?
I've just installed MCDS and I'm getting the same error when the service tries to start?
Thanks,
Chris.
Hi Chris,
What we figured out is this could happen if the configured paths get messed up. That is, the Erlang system is trying to start its NT Service but can't find the right files because it has incorrect registry entries that make it to look in the wrong place.
One way this might happen if there were multiple installation attempts, especially if the different installation attempts had differing INSTALLDIR locations. Perhaps, in a 2nd install attempt, you changed your mind and decided to put the software in a different directory than the default directory (c:\Program Files\NorthScale\...)). The best way around this if you're attempting re-installations is to do a clean uninstall first -- use Windows' Control Panel -> Add/Remove Programs, etc).
Another possibility is the file permissions got mismatched. For example, if you installed the software as one user. But, are running as a different user that doesn't have the right file privileges?
Thanks for any info you can share to help nail this!
Cheers,
Steve
I've just installed MCDS and I'm getting the same error when the service tries to start?
Thanks,
Chris.
Steve,
Thanks for the info, looks like you hit the nail on the head. I have only installed the application once, on my development XP machine, so discounted the path issue.
I gave all users of the machine full rights to the default installation directory & sub-directories, and removed any read-only attributes, and the service then started successfully.
Thanks again.
Chris.
Glad you are having success!
I am getting the same error messages in the event logs. I successfully installed the server to our development machine and it runs great over there. I installed to production and the service fails to start with:
NorthScaleServer
Erlang machine stopped instantly (distribution name conflict?). The service is not restarted, ignoring OnFail option.
Followed by:
The NorthScaleServer service terminated with the following error:
The process terminated unexpectedly.
Both servers are Windows 2008 R2. I installed on the development machine with domain admin rights and on the prod with local admin rights. I tried changing the permissions on the install directory to allow all users access and it still fails. The service is running as "Local System". I tried running the service under my account (same account that did the install) and it got the same error. Is there anything else I can look at to troubleshoot this?
Hi, I'd like to get some logs from you to help us solve this, and will private message you on the instructions.
Thanks,
Steve
I am having this problem. Here is what I did:
1. Installed Northscale Memcached Server on two servers (we'll call them A and B).
2. Went to B, did Join Cluster and specified A's IP address.
3. Web Console wouldn't come back up, I had to keep refreshing the browser because it timed out.
4. Finally it came up, said I joined correctly.
5. The cluster status was Critical and said 2 server nodes managing 1 bucket.
6. I went to A and the cluster status was Healthy, but said only 1 server node managing 1 bucket).
7. I restarted the Northscale service on both A and B.
8. It started fine on A, still says 1 server node managing 1 bucket.
9. It no longer starts on B with the error Erland machine stopped instantly in the event log.
Other notes:
I enabled ports 8080, 11211, and 11212 in Windows Firewall with a scope of my subnet on both machines. I can verify that this is working by going to B and seeing the web console on A.
So, after this I uninstalled on both machines, then re-installed. I have not changed anything during the install, I kept the defaults. Tried again, still fail.
How do I get this going?
-Corey
Additional info:
I uninstalled on both servers, then turned off windows firewall, then re-installed, then joined B to A, it worked perfect.
So Windows Firewall is getting in the way even though I have opened ports 8080, 11211, and 11212. So, what additional ports need to be opened? Or what else needs to be done with Windows Firewall configuration to get this working?
Thanks.
-Corey
Oh, and we opened the ports for TCP. Is that right?
I'll check the docs to see if I can find the answer but someone from support who knows and can response quickly would be fantastic. Thanks!
-Corey
As covered in the docs, TCP ports 11211, 11212, 4369, 8080 and from 21100 to 21199 should be open between the nodes of the cluster with a default configuration. You can change which HTTP and memcached services ports you use.
Thanks.
I configured those ports. I was then able to add all of my servers to the cluster. However, when I go to create a bucket, I get this error:
An error was encountered when requesting data from the server. The console has been reloaded to attempt to recover. There may be additional information about the error in the log.
I checked the log and don't see anything useful that would tell me what the problem is. Here is a sample entry (there are lots of them):
Server error during processing: ["web request failed", {path, "/pools/default/bucketsStreaming/WebConsole"}, {type,error}, {what,{case_clause,#Ref<8814.0.18.208232>}}, {trace, [{menelaus_util,expect_prop_value, ["WebConsole", [{"default", [{auth_plain,undefined}, {size_per_node,64}]}]]}, {menelaus_web,checking_bucket_access,4}, {menelaus_web,loop,3}, {mochiweb_http,headers,5}, {proc_lib,init_p_do_apply,3}]}]
The cluster says it is Healthy with 14 nodes managing 1 bucket (the default).
-Corey
I disabled the Windows Firewall on all of these servers, tried again, and still fail.
So something unrelated to the firewall is causing a problem.
-Corey
Ok, so the next thing I tried was to go to each server and Leave the cluster. I did that. All of them but one actually left the cluster. Half of them showed an error message about getting data after successfully leaving the cluster. So I restarted the web console on all of them and that removed the error message.
The server that thinks it is still part of the cluster shows that there are 3 servers in the cluster. 2 of the 3 show that they are "Down" so I go to those servers and they do not believe they are part of the cluster. So I go back to the server that still thinks it is part of the cluster and try to leave the cluster but he can't.
So now I am stuck with one server thinking there are 3 in its cluster, 2 of which it thinks are down, and he himself won't leave the cluster. Or at least won't decide to believe he is on his own cluster or that the down servers have left.
---
Ok, I found the remove server from cluster link hidden in the expando-server list that appears below the status. I removed them, now they are all on their own cluster.
---
I will try adding them all to the same cluster again without any firewall stuff and see what happens.
---
Instead of poking port holes in Windows Firewall, if I wanted to simply grant access to the specific executables, which executables would I grant access to? I did not find this in the doc.
-Corey
Ok, I got this working by starting over with Windows Firewall disabled. So I double-dog checked my firewall settings and they are correct. So either the port list you gave me was not comprehensive, or there is a bug somewhere in the product, or double-dog check fail.
Regardless, firewall config would probably be made much simpler by my simply unblocking specific executables, if you could tell me what those are. I checked for all .exe's in the directory structure and there are a lot there. So if I knew which ones were the ones that use the network then I could unblock those specific .exe's.
I would like to get this working without Windows Firewall disabled but at this point I haven't had much luck even with the correct firewall settings.
Thanks.
-Corey
Good point. For port 11211 it's memcached.exe and for 11212, 4369, 8080 and from 21100 to 21199 it's erl.exe. We should add that to the documentation.
I unblocked these programs:
C:\Program Files\Northscale\Memcached Server\bin\memcached\memcached.exe
C:\Program Files\Northscale\Memcached Server\bin\erlang\bin\erl.exe
C:\Program Files\Northscale\Memcached Server\bin\erlang\erts-5.7.4\bin\erl.exe
When I load the console (on one of the servers), I get the "An error was encountered...the console has been reloaded..." errors. Then I get the "...application received multiple invalid responses from the server...reloading the application has been suppressed" error dialog.
Are there any other .exe's that need to be unblocked? Perhaps erlsrv.exe?
Thanks.
-Corey
It's not working because erl.exe runs in svchost.exe. Hmm. What to do now...
...I would be interested in knowing if others have been successful using Northscale Memcached Server behind Windows Firewall.
I will try opening specific ports one more time, perhaps I got something wrong.
-Corey
Running windows server 2003 SP2
I get the same error message after trying to start the service on a production machine.
Install and service worked fine in development.
Message in logs:
NorthScaleServer: Erlang machine stopped instantly (distribution name conflict?). The service is not restarted, ignoring OnFail option.
Installed with Admin account and running as Local System, windows firewall not installed.
Any ideas on how to get the service started?
Thanks.
dolmeck, is it possible that the "Local System" account doesn't have enough permissions? Can you try running it as Admin first to confirm that the installation is setup properly?
Thanks!
Perry
Thanks for your reply,
Unfortunately, I get the same problem when trying to run the service as an Administrator.
dolmeck, can you upload a dump of the logs? You should be able to run dump_logs.bat and send that over.
Thanks!
Perry
dolmeck, that error is being generated because while the process is starting up it is finding another Erlang process running. It's possible that either an existing installation is still there, or that there was a problem with one of the processes and it hasn't shutdown completely. While the service is still shutdown, can you take a look at your task manager and see if there are any erl.exe/epmd.exe process running? Feel free to send a dump of the full list of processes running and I'll take a look through.
Thanks!
Perry
Hello Perry,
I've included a dump file that also has a list of processes running. I did find the epmd.exe running, so I stopped it and tried starting the service again with the same error reported:
NorthScaleServer: Erlang machine stopped instantly (distribution name conflict?). The service is not restarted, ignoring OnFail option.
Also with an Error 1067: Process terminated unexpectedly
Hope this helps.
Thanks.
Thanks for that dolmeck...the dump.txt was quite helpful. It looks like something we haven't seen before and I will pass this along to the engineering team.
As a workaround, can you try the following:
1. from "INSTALLDIR\Memcached Server\bin"
2. run "service_stop.bat"
3. run "service_unregister.bat"
4. run "service_register.bat ns_1@"
5. run "service_start.bat"
In the field, please put in the IP address on this server that you want to use our service to bind to.
Thanks again, let me know how that works.
Perry
Unfortunately, we are still having the same problems.
I've attached the erl crash dump file in case that sheds some light on the issue.
Unfortunately that crash dump doesn't really give us much clues as to what's going on. Any chance you could try this on a new machine with a fresh install?
Thanks.
Perry
Hello Perry,
We've gotten the service to install and run successfully on 4 other servers with similar setups (Win 2003 server sp2). Now, windows firewall is disabled on the problem server, however, it is still behind a firewall. Could not opening the required ports on this server prevent the service from running? Or, does the error "NorthScaleServer: Erlang machine stopped instantly (distribution name conflict?). The service is not restarted, ignoring OnFail option." indicate a different issue?
Thanks.
I believe it does indicate a different issue. The service should be able to startup locally regardless of whether it can talk outside of the machine so the firewall doesn't really come into play (at this point at least). It's also very good to know that you were able to successfully install it on similar machines.
Can you try completely uninstalling and reinstalling on this problem machine? After uninstalling (via add/remove programs), please check as best you can that everything is removed (the service, any registry entries, any files)
Thanks!
Perry
Thanks for your reply Perry. I did an uninstall and made sure the files, service, registry was clear. I then did a re-install of the application (as an Admin). It was successful as before, however, the service still does not want to start reporting the same error as before. Any ideas are appreciated. Thanks!
Is there anything different about this node from the others that you have it installed on?
What does the networking configuration look like (single IP, multi-homed?)
Can you give me an updated list of processes as you did before?
Thanks
Perry
Hi guys,
I am experiencing the same problem locally (windows 7 64bit), luckily not in prod, hope this issue can be resolved soon.
Event logs: NorthScaleServer: Erlang machine stopped instantly (distribution name conflict?). The service is not restarted, ignoring OnFail option.
Symptoms: Service won't start - stop, unregister, register IP, start does not fix this issue. Complete reinstall [B]fixes[/B] this issue.
This seems to happen after putting machine in sleep mode.
I am also experiencing this, only on my dev machine. Windows 7, 64-bit. Tried uninstalling, then installing. It started for a day and now it doesn't start again.
I'm also getting this error on Windows 7 64 bit. My dev environment is Java development with Apache Tomcat. The problem happens is when Apache Tomcat start up and then it stops the NorthScale process. I've tried to go into Windows Control Panel and restart the service but it still won't startup at all (I think there is an unclean shutdown).
The answer to my problem is to first install NorthScale, log into the console and change the port to 8070 and then save. So far it's working for me.
Interesting albert, thanks for the response.
Corey, is it at all possible that you have something else running on port 8080?
Perry
I'm having exactly the same problem on a Windows Vista dev box:
- Service starts and immediately stops.
- "Erlang machine stopped instantly (distribution name conflict?). " in event log.
- Using the batch files to unregister and re-register the service has no effect.
- A completely fresh install fixes the issue.
- No other erlang processes are running on the machine although, after trying to start the service, epmd.exe is left behind.
- The problem occurs fairly consistently following a system sleep. Once or twice it's happened following a normal shutdown.
- No OS permission or TCP port collision issues that I can detect.
Glenn
Just want to add that I have exactly the same issue as Glenn Davies above forcing me to un-install / re-install on the dev box (Win 7 x64) every couple of days...
I'm running version 1.0.3 x64 with a single, local instance.
Glenn, does it only happen with sleeping or occassionally a shutdown? I will try to reproduce, but want to make sure that the server is otherwise stable for you.
macinnesm, are you sleeping/shutting down as well to cause the problem?
Thanks
Perry
alberttwong, if you stop the Apache service and start the NorthScale service you should be able to change the port that the Web Console runs on (via Cluster Settings in the UI). Let me know if that helps you run both side-by-side or if there is any other issue.
Perry
macinnesm, are you sleeping/shutting down as well to cause the problem?
Thanks
Perry
Hi Perry,
My dev machines stays on, hasn't shutdown or gone into sleep/hibernate mode. I have locked the console, but that's it.
It seems to last about 2-3 days and then dies. You can't even visit the web server [url]http://localhost:8080/index.html[/url]
Marcus
It's very weird that it would die after a certain amount of time. Can you graph the "dump_logs" output and send that over?
...send it where? Have you got an email address I can send to?
Sure thing...perry -at- northscale -dot- com
Thanks!
Perry
I am having the same error when I restart my windows 2003 box. I have to manually start the NorthScale service, and then it does run.
I checked file permissions on the install directory, and gave everyone full control, but the problem persists.
Also, I am pretty sure there is only a single install on this machine.
Any ideas?
What's the specific error you are receiving? Can you paste me the output of your task manager showing the processes that are running?
Here is the first third of the list:
smss.exe \SystemRoot\System32\smss.exe 344
csrss.exe C:\WINDOWS\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,3072,512 Windows=On SubSystemType=Windows ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2 ProfileControl=Off MaxRequestThreads=16 396
winlogon.exe winlogon.exe 424
services.exe C:\WINDOWS\system32\services.exe 472
lsass.exe C:\WINDOWS\system32\lsass.exe 484
svchost.exe C:\WINDOWS\system32\svchost.exe -k DcomLaunch 640
svchost.exe C:\WINDOWS\system32\svchost.exe -k rpcss 728
svchost.exe C:\WINDOWS\system32\svchost.exe -k NetworkService 792
svchost.exe C:\WINDOWS\system32\svchost.exe -k LocalService 828
svchost.exe C:\WINDOWS\System32\svchost.exe -k netsvcs 844
spoolsv.exe C:\WINDOWS\system32\spoolsv.exe 1008
msdtc.exe C:\WINDOWS\system32\msdtc.exe 1032
cissesrv.exe "C:\Program Files\HP\Cissesrv\cissesrv.exe" 1168
cpqrcmc.exe C:\WINDOWS\system32\cpqrcmc.exe 1200
vcagent.exe C:\hp\hpsmh\data\cgi-bin\vcagent\vcagent.exe 1216
dns.exe C:\WINDOWS\System32\dns.exe 1240
svchost.exe C:\WINDOWS\System32\svchost.exe -k WinErr 1264
hudson.exe "c:\Users\hudson\hudson.exe" 1464
inetinfo.exe C:\WINDOWS\system32\inetsrv\inetinfo.exe 1572
java.exe "java" -Xrs -Xmx256m -Dhudson.lifecycle=hudson.lifecycle.WindowsServiceLifecycle -jar "c:\Users\hudson\hudson.war" --httpPort=8080 1616
jqs.exe "C:\Program Files\Java\jre6\bin\jqs.exe" -service -config "C:\Program Files\Java\jre6\lib\deploy\jqs\jqs.conf" 1628
MsDtsSrvr.exe "C:\Program Files\Microsoft SQL Server\90\DTS\Binn\MsDtsSrvr.exe" 1676
sqlservr.exe "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe" -sMSSQLSERVER 1976
msmdsrv.exe "C:\Program Files\Microsoft SQL Server\MSSQL.2\OLAP\bin\msmdsrv.exe" -s "C:\Program Files\Microsoft SQL Server\MSSQL.2\OLAP\Config" 1992
svchost.exe C:\WINDOWS\system32\svchost.exe -k regsvc 2188
tcpsvcs.exe C:\WINDOWS\system32\tcpsvcs.exe 2232
sqlbrowser.exe "C:\Program Files\Microsoft SQL Server\90\Shared\sqlbrowser.exe" 2248
epmd.exe C:\PROGRA~1\NORTHS~1\MEMCAC~1\bin\erlang\ERTS-5~1.4\bin\epmd -daemon 2260
sysdown.exe C:\WINDOWS\system32\sysdown.exe 2280
smhstart.exe C:\hp\hpsmh\bin\smhstart.exe
And here is the rest:
wins.exe C:\WINDOWS\System32\wins.exe 3112
msftesql.exe "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\msftesql.exe" -s:MSSQL.1 -f:MSSQLSERVER 3180
svchost.exe C:\WINDOWS\System32\svchost.exe -k iissvcs 3228
hpsmhd.exe C:\hp\hpsmh\bin\hpsmhd.exe -fC:/hp/hpsmh/conf/smhpd.conf 3608
cmd.exe C:\WINDOWS\system32\cmd.exe /C "C:\hp\hpsmh\bin\rotatelogs.exe -f C:/hp/hpsmh/logs/error_log 5M" 3712
rotatelogs.exe C:\hp\hpsmh\bin\rotatelogs.exe -f C:/hp/hpsmh/logs/error_log 5M 3720
cmd.exe C:\WINDOWS\system32\cmd.exe /C "C:\hp\hpsmh\bin\rotatelogs.exe -f C:/hp/hpsmh/logs/access_log 5M" 3728
rotatelogs.exe C:\hp\hpsmh\bin\rotatelogs.exe -f C:/hp/hpsmh/logs/access_log 5M 3736
hpsmhd.exe C:\hp\hpsmh\bin\hpsmhd.exe -d C:/hp/hpsmh -f C:/hp/hpsmh/conf/smhpd.conf 3804
rotatelogs.exe C:\hp\hpsmh\bin\rotatelogs.exe -f C:/hp/hpsmh/logs/error_log 5M 3832
cmd.exe C:\WINDOWS\system32\cmd.exe /C "C:\hp\hpsmh\bin\rotatelogs.exe -f C:/hp/hpsmh/logs/error_log 5M" 3848
rotatelogs.exe C:\hp\hpsmh\bin\rotatelogs.exe -f C:/hp/hpsmh/logs/error_log 5M 3892
cmd.exe C:\WINDOWS\system32\cmd.exe /C "C:\hp\hpsmh\bin\rotatelogs.exe -f C:/hp/hpsmh/logs/access_log 5M" 3900
rotatelogs.exe C:\hp\hpsmh\bin\rotatelogs.exe -f C:/hp/hpsmh/logs/access_log 5M 3908
SQLAGENT90.EXE "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\SQLAGENT90.EXE" -i MSSQLSERVER 3984
svchost.exe C:\WINDOWS\System32\svchost.exe -k termsvcs 4572
alg.exe C:\WINDOWS\System32\alg.exe 4708
wmiprvse.exe C:\WINDOWS\system32\wbem\wmiprvse.exe 4976
logon.scr logon.scr /s 5208
csrss.exe C:\WINDOWS\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,3072,512 Windows=On SubSystemType=Windows ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2 ProfileControl=Off MaxRequestThreads=16 5704
winlogon.exe winlogon.exe 5732
rdpclip.exe rdpclip 5900
explorer.exe C:\WINDOWS\Explorer.EXE 6016
cpqteam.exe "C:\Program Files\HP\NCU\cpqteam.exe" 6108
jusched.exe "C:\Program Files\Common Files\Java\Java Update\jusched.exe" 6136
ctfmon.exe "C:\WINDOWS\system32\ctfmon.exe" 2396
cmd.exe "C:\WINDOWS\system32\cmd.exe" 2328
wmic.exe WMIC /OUTPUT:C:\ProcessList.txt PROCESS get Caption,Commandline,Processid 1088
wmiprvse.exe C:\WINDOWS\system32\wbem\wmiprvse.exe 2020
I see there is at least one northscale process running:
epmd.exe C:\PROGRA~1\NORTHS~1\MEMCAC~1\bin\erlang\ERTS-5~1.4\bin\epmd -daemon 2260
The first error is:
NorthScaleServer: Erlang machine stopped instantly (distribution name conflict?). The service is not restarted, ignoring OnFail option.
I have found that I am only unable to restart the service AFTER I have joined another server to the cluster. After the server is added and you try to restart the northscaleserver via services.mmc, it fails.
Any ideas?