Enyim.Caching Membase client version 2.2 and server issues
Hi,
We are currently trying out the Windows distribution of Membase memcached server and quite happy so far, except the following minor issues. Can you please comment on them?
[Membase Client]
1) We are unsuccessful running the latest 2.2 version of Enyim.Caching. Consistently getting socket timeout errors after the first few successfull operations. Previous 2.0b version was running fine, with occasional warnings without any problems. Please see attached files for client logs.
2) One of the constructors of MembaseClient needs to be fixed as follows (all versions):
// Does not use value of sectionName parameter but looks for a section named "sectionName".
public MembaseClient(string sectionName, string bucketName) :
this((IMembaseClientConfiguration)ConfigurationManager.GetSection("sectionName"), bucketName) { }
[Membase Server]
3) Membase Server gets confused when IP address of the host changes and cannot detect server nodes in cluster (marked as "Down" at web console). This happens when I use host (dev laptop) is used in two different networks. Possibly not an issue with hosts with static IP address. Re-registering Membase Server solves the problem.
Diagnostic report attached. 192.168.1.98 is the IP of the host when server is installed. 192.168.1.67 is the IP when laptop is used in another network. This diag report is taken again back on 192.168.1.98 when the node seems to be "Down".
Ilhan, for your first issue, are these actually causing operations to fail, or are they just warnings in the logs?
Perry
They cause the operations to fail.
I'd like to report another strange behaviour we realized today. Memory consumption of Memcached.exe process continuously increases. We realized the situation when it reached 1.2GB and slowed down everything else running on the same node (this is a computer having only 2GB RAM). There was only one bucket limited to 64MB on the node.
Next I restarted the NorthScale service and just observed the empty cache. Same behaviour. Attached are a few diag reports taken in the first 5 minutes or so. It looks like there is a discrepancy between the memory usage reported in diag reports and the task manager. For the first couple of minutes they report similar memory size, then diag report stops at a certain value while memory consumption stated in task manager keeps on increasing.
Any ideas why is this happening?
Thanks.
I figured out when this memory issue started happening: It happened after I removed the node from the cluster and started using it on its own. I had backed up the original config.dat file before joining the cluster. I reverted to that and memory does not increase any more.
Attached are the config files before and after joining the cluster. You can try using config-after-cluster.dat to reproduce the problem.
Thanks ilhan, I'll take a look through those.
Did you delete the default bucket at any point? There's a known issue with memory leaking due to that...
Perry
Yes indeed, I did.
That's the problem then. We discovered it a few weeks ago...the workaround is to recreate a bucket called "default". You can make it very small, but it needs to be there by name to keep the memory from leaking. This will be fixed in an upcoming release.
Perry
Thanks Perry.
Ilhan, there was just a new release to the Enyim client (2.4) to fix your Constructor issue. Could you implement that and see if you're still having issues?
Thanks!
Perry
Hi Perry,
Constructor issue is fixed, but socket timeout issue still exists.
Are you able to reproduce the socket timeout issue?
ilhan, I have not been able to reproduce them in my lab...do you have some test code that I can run?
Perry
Hi Perry,
I emailed you the tests, since it exceeded the size limit for attachments.
Thanks a lot for your time.
I also meant to mentioned that after this problem occurs we need to restart the NorthScale server in order to get it back handling any further operations.
ilhan, I didn't get any email from you...
I 'm attaching the test source and the config file used here then. Thanks.
Tests are very simple and run the same operations with both Enyim client and NorthScale client.
Enyim client runs successfully, NorthScale client fails. Try running the "NorthScaleTest" a couple of times if it doesn't fail in the first run. It will eventually after a few runs.
You can check the log4net.log file in the debug folder for DEBUG logs of the clients.
This problem started happening after version 2.0beta. I'm running this on 32bit Windows Vista.
I have a feeling that the problem occurs when running multiple gets with high number of items (i.e. 1000) with NorthScale client. It looks like it runs successfully most of the time when tested with 10 or 1000 items.
Thanks Ilhan, I'll take a look at that.
Perry
- can you please paste the socketPool config? i'm interested in the timeout values
- when you're mgetting 1000+ items, what is the average item size? 1k? 100k?
- does increasing the timeout values solve the problem?
- are the servers and the clients on the same LAN?
Hi Attila,
* socketPool configurations: Increased connectionTimeout for northscale from 10 seconds to 30 seconds. No difference, same error.
* Item size in the test are 285 bytes (size of the binary serialized object).
* Server and client are on the same machine, no network roundtrip.
Looking at the logs for the NorthScale client for consequent runs of the test, the problem happens as follows:
1. Test can store 1000 items successfully
2. Test cannot get any of the items. Log has the following last line after completion of storing of items and before get operation starts
2010-08-11 10:25:11,276 [7] WARN Enyim.Caching.Memcached.MemcachedNode+InternalPoolImpl - Marking pool 192.168.1.98:11212 as dead
At this point there is no exception and client returns an empty dictionary (which is another problem, should say that there is error).
3. Running the test again, test cannot connect to server at all (and server needs to be restarted):
2010-08-11 10:41:56,925 [7] DEBUG NorthScale.Store.MessageStreamListener - Starting the listener. Queue=True
2010-08-11 10:41:56,930 [8] DEBUG NorthScale.Store.MessageStreamListener - Started working.
2010-08-11 10:41:56,969 [8] DEBUG NorthScale.Store.MessageStreamListener - finding the first (still) working pool.
2010-08-11 10:41:57,388 [8] DEBUG NorthScale.Store.MessageStreamListener - Found pool url [url]http://192.168.1.98:8080/pools/default/bucketsStreaming/default[/url]
2010-08-11 10:41:57,389 [8] DEBUG NorthScale.Store.MessageStreamListener - Start receiving messages.
2010-08-11 10:41:57,584 [7] DEBUG Enyim.Caching.Memcached.MemcachedNode+InternalPoolImpl - Acquiring stream from pool.
2010-08-11 10:41:57,587 [7] DEBUG Enyim.Caching.Memcached.MemcachedNode+InternalPoolImpl - Could not get a socket from the pool, Creating a new item.
2010-08-11 10:41:57,596 [7] DEBUG Enyim.Caching.Memcached.MemcachedNode+InternalPoolImpl - Done.
2010-08-11 10:41:57,617 [7] DEBUG Enyim.Caching.Memcached.MemcachedNode+InternalPoolImpl - Releasing socket e2e2ba24-ed20-414e-8ac2-278ad9c13aec
2010-08-11 10:41:57,624 [7] DEBUG Enyim.Caching.Memcached.MemcachedNode+InternalPoolImpl - Are we alive? True
2010-08-11 10:41:57,633 [7] DEBUG Enyim.Caching.Memcached.MemcachedNode+InternalPoolImpl - Acquiring stream from pool.
2010-08-11 10:41:57,636 [7] DEBUG Enyim.Caching.Memcached.PooledSocket - Socket e2e2ba24-ed20-414e-8ac2-278ad9c13aec was reset
2010-08-11 10:41:57,640 [7] DEBUG Enyim.Caching.Memcached.MemcachedNode+InternalPoolImpl - Socket was reset. e2e2ba24-ed20-414e-8ac2-278ad9c13aec
2010-08-11 10:42:08,175 [7] ERROR Enyim.Caching.Memcached.Operations.Operation - System.IO.IOException: Failed to read from the socket '192.168.1.98:11212'. Error: TimedOut
at Enyim.Caching.Memcached.PooledSocket.BasicNetworkStream.Read(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.BufferedStream.Read(Byte[] array, Int32 offset, Int32 count)
at Enyim.Caching.Memcached.PooledSocket.Read(Byte[] buffer, Int32 offset, Int32 count)
at Enyim.Caching.Memcached.Operations.Binary.BinaryResponse.Read(PooledSocket socket)
at Enyim.Caching.Memcached.Operations.Binary.StoreOperation.ExecuteAction()
at Enyim.Caching.Memcached.Operations.Operation.Execute()
2010-08-11 10:42:08,238 [7] DEBUG Enyim.Caching.Memcached.MemcachedNode+InternalPoolImpl - Releasing socket e2e2ba24-ed20-414e-8ac2-278ad9c13aec
2010-08-11 10:42:08,239 [7] DEBUG Enyim.Caching.Memcached.MemcachedNode+InternalPoolImpl - Are we alive? True
2010-08-11 10:42:08,253 [7] WARN Enyim.Caching.Memcached.MemcachedNode+InternalPoolImpl - Marking pool 192.168.1.98:11212 as dead
4. At any time during the previous steps, using enyim client against memcached server directly works OK.
It looks like something in the NorthScale client (versions 2.2 and above) causes a problem in the NorthScale server. Let me remind that this test works in version 2.0b of the client.
Thanks.
Can you please try this version?
[url]http://drop.io/wstg65g/asset/northscale-store-2-4-25-gd2c8b0d-zip[/url]
one more question i forgot to ask. how are you using the client?
a) one static instance per bucket
b) one instance per thread
c) using(client) { client.store(..); }
[url]http://drop.io/wstg65g/asset/northscale-store-2-4-25-gd2c8b0d-zip[/url]
Same behaviour, except this version waits indefinitely at the same point where the normal version fails.
a) one static instance per bucket
b) one instance per thread
c) using(client) { client.store(..); }
In the tests, our usage is equivalent to (a).
BTW, you can have a look at the test source attached in post #17 of this thread if that would help.
Thanks.
I tested the issue with NorthScale client 2.6 but the problem still exists.
This time I attached detailed profiler session outputs for 2 consecutive runs. Both fail after 100 seconds which seems to be the timeout for socket read (records with value of 99,241 ms under the "Own Time" column).
In addition, also attached are the log4net outputs for errors and the test solution (you need to reference NUnit and NorthScale 2.6 assemblies, solution got too big to attach with those).
To recap:
Test passes with Memcached server and Enyim client. But fails during multi-get with NorthScale server and NorthScale client in the FIRST RUN, and cannot read from the socket at all in the SECOND RUN. Since cache client is initialized from scratch and cache is flushed each time before the tests, failure of the second run indicates a problem in the NorthScale server. However, this problem may be initiated by the NorthScale client.
We are looking forward to a progress on this issue since it also prevents us from checking out Membase.
Regards,
Ilhan
Ilhan, we just released Membase Server beta 4 which introduces support for memcached bucket types. It would be very helpful to know whether this issue exists in that version.
Thanks.
Perry
Hi Perry,
I run the test successfully on Membase Server beta 4 with both a memcached and a membase bucket. Event with 10,000 items there is no problem.
However memcached bucket cannot be flushed, items stay in the bucket on subsequent runs. Membase bucket can be flushed.
Thanks for letting me know about beta 4 :)
Ilhan
Thanks Ilhan, I'll look into that and let you know what the problem is.
Thanks again.
Perry
Ilhan, looks like flush is actually doing the right, but not resetting the "curr_items" stat. I've filed a bug for it ([url]http://bugs.northscale.com/show_bug.cgi?id=2497[/url]).
Hope everything else is going well. Let me know if there's anything else I can help with. I think this thread is done now, please open a new one for any new issues.
Thanks!
Ilhan, I wanted to follow up with you on this. In fact, that bug is being marked as invalid since there's too much work involved in resetting the item count during a flush for memcached. The idea is to keep it as simple and as fast as possible and as long as the items are no longer actually available, there doesn't need to be any more work done.
For the Membase bucket type, flush takes on a slightly different meaning since we have to be absolutely sure of what data is in the database and how much space it's taking up, etc. That's why flush on Membase does clear the item count.
For reference as well, the open source memcached server does not clear the current items stat on a flush either.
Please let me know if there's anything else I can help you with.
Thanks.
Perry
Thanks ilhan, I'll take a look at the the diagnostics. We have some known issues with running the server on a host whose IP address changes. You've found the correct workaround, the best practice is to use static IP addresses.
Thanks, I'll get back to you.
Perry
Forum support is great for free but sometimes you need a guaranteed response time and dedicated resources for your questions or issues.
Consider purchasing enterprise-level support from Membase: http://www.membase.com/products-and-services/overview
Call or email "sales -at- membase -dot- com" today!