Membase service needs to be restarted frequently
We've successfully upgraded to 1.7.1 our numerous Windows servers, thanks to Perry's patch.
Our membase servers are on the same machines as our web servers.
Right now our web apps don't talk to Membase, we're in a testing/migration phase where Membase services are up and running but we only hit them w/ a test app every minute or so and from time to time a test page that tries to Set/Get a value.
Basically an extremely small, to not say insignificant amount of traffic for 9 clustered Membase servers.
However quite frequently, in the order of about 2 to 3 times a week, one of the 9 Membase servers at random will get in a funky state where the service is up but Membase client cannot Set or Get a value on it.
The solution so far has been to RDP into that server and restart the Membase service. That solves the issue, at least until another server gets into that funky state.
Our test app that uses the same Membase client and has logs turned on gives us the following information:
---------------------------------
2011-09-06 13:51:17,488 [18] ERROR Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl Could not init pool.
System.TimeoutException: Could not connect to 10.2.5.10:11210
at Enyim.Caching.Memcached.PooledSocket.ConnectWithTimeout(Socket socket, IPEndPoint endpoint, Int32 timeout)
at Enyim.Caching.Memcached.PooledSocket..ctor(IPEndPoint endpoint, TimeSpan connectionTimeout, TimeSpan receiveTimeout)
at Enyim.Caching.Memcached.MemcachedNode.CreateSocket()
at Enyim.Caching.Memcached.Protocol.Binary.BinaryNode.CreateSocket()
at Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl.CreateSocket()
at Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl.InitPool()
---------------------------------
Our test page, which is displaying various properties available on the Membase client does show that the problematic Membase server has a flag "IsAlive" set to FALSE when this issue occurs.
The one thing I haven't tried yet is to connect to the web console on that host when that occurs to see what happens, but I'll update this post whenever that happens again.
We'd really need to have a more stable Membase cluster out there before we can start directing actual traffic to it, but I'm not sure where to look right now to troubleshoot this issue...
Hoping someone can help.
Thanks.
How is your client configured to connect to the Membase cluster? Does it connect to the node directly or through some other loadbalancer?
We have the direct IP addresses set in the web.config file of our web site. The Membase servers are accessed directly.
As soon as the problem occurs again I will do those 2 operations and let you know how it went. Thanks for your help.
Just happened again... here is the what happens when I access the web console:
------------------------------
Unable to connect
Firefox can't establish a connection to the server at 10.2.5.9:8091.
* The site could be temporarily unavailable or too busy. Try again in a few
moments.
* If you are unable to load any pages, check your computer's network
connection.
* If your computer or network is protected by a firewall or proxy, make sure
that Firefox is permitted to access the Web.
If/when this occurs again, can you:
-Log onto the WebUI of the problematic node and report whether it is working properly
-Run this command: "C:\Program Files\Membase\Server\bin\mbstats :11210 all" where is the IP address of the node that is having trouble
Here is the result of the command:
----------------------------------------
C:\Program Files\Membase\Server\bin>mbstats 10.2.5.9:11210 all
Stats '' are not available from the requested engine.
Just to add more context this time the issue was slightly different in the fact that the service actually stopped by itself.
I can't find anything related anywhere in the Windows Event Viewer though.
Just happened again, same server 30 minutes later with absolutely no traffic of consequence.
So it seems like the Membase service is in fact crashing/exiting.
Can you try downloading the latest version (1.7.1.1) which has a few more Windows fixes?
If that still doesn't work, please run "C:\Program Files\Membase\Server\bin\mbbrowse_logs > " and send over the resulting file (zipped first) to perry@couchbase.com?
Thanks
Perry
WONDERFUL Post.thanks for share..more wait ..
Cyber monday 2009 palmtop ginale skin peel mesonephron Water damage restoration los angeles dubber
Hi,
I am having the same type of issue with membase 1.7.2r-20-g6604356.
Was there any cause for the crash detected?
Please if and when you solve an issue via email share it with others via the forum.
Thanks
Thank you for the useful answers provided in this thread.
One of the first things we want to do is investigate whether or not the problem is within the Membase servers themselves or something external.
If/when this occurs again, can you:
-Log onto the WebUI of the problematic node and report whether it is working properly
-Run this command: "C:\Program Files\Membase\Server\bin\mbstats :11210 all" where is the IP address of the node that is having trouble
If both of those work, then the issue is not within the Membase node itself.
How is your client configured to connect to the Membase cluster? Does it connect to the node directly or through some other loadbalancer?
Perry
Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Couchbase: http://www.couchbase.com/products-and-services/overview
Call or email "sales -at- couchbase-dot- com" today!