Membase limitations or I'm missing something
Hi..
I recently started evaluating Membase for one of the projects and was quite impressed with the ease with which it can be installed, re-balanced, and nodes can be added or removed from the cluster. When I started executing test cases, I started hitting some problems which I've described in this post. I'll appreciate if someone can explain if the findings are correct or not, and how we can address some of these findings.
Here we go...
Membase version 1.7
Application design and testing approach: Multi-threaded JUnit test. Multiple threads are created by JUnit test and each thread makes use of Apache HttpClient to send POST/GET requests to the a RESTful service created using Spring 3.x. The RESTful service in turn makes use of Spymemcached client library to interact with Membase (if Spymemcached is used as Type 2 client) or Moxi (if Spymemcached is used as a Type 1 client). Apache HTTP Server is the point of entry for requests and behind it I've 3 Tomcat instances running in cluster. The RESTful service is deployed as a web application in these Tomcat instances.
1. vBucket support in Spymemcached: I tried to use Spymemcached client library as a Type 2 client (vBucket aware). During testing it was found that when a node in the cluster is brought down, the whole cluster becomes unresponsive: ( see the following link, which suggests that some folks did face the same issue http://code.google.com/p/spymemcached/issues/detail?id=181) Other related issues: http://code.google.com/p/spymemcached/issues/detail?id=136, http://code.google.com/p/spymemcached/issues/detail?id=108
The rest of the findings here are based on using Spymemached as Type 1 client which sends requests to Moxi, which in turn interacts with Membase cluster.
2. It's not possible to avoid OOM errors in high load situations - When maximum memory allocated to a bucket is reached in Membase, temp OutOfMemory errors are reported if high number of requests per seconds are sent to membase. The test results showed that setting queue_age_cap, ep_mem_high_wat and ep_mem_low_wat parameters for Membase doesn't guarantee that OOM errors will not be reported.There already exists an issue in JIRA for this: http://www.couchbase.org/issues/browse/MB-4020.
Note that I am sending requests in a while loop, which doesn't really reflect the production environment.
3. Loss of data when application is under heavy load - It was found that when multiple threads are sending concurrent save requests to service, not all requests resulted in creation of an object in Membase. This means that data loss happens when the system is under heavy load. Membase doesn't report OOM errors, which means that the data loss may be happening because of request congestion on Apache HTTP Server or Apache Tomcat server. Has anyone come across this, and how did you tune your Apache HTTP Server and Tomcat servers?
4. Failure of a server node will result in loss of data - When a membase server fails, in-memory data that is not yet persisted or redistributed will be lost. I saw that under heavy load conditions, there were lot of 'Can't redistribute...Trying primary again..' messages, which gives me the indication that redistribution may also be failing under heavy load.
5. Shutting down a node brings down the performance of Membase drastically - The cluster performance comes down to 5-10 creates per seconds. The performance bounce backs only after the cluster is rebalanced manually or if you click the Failover button in Membase console. Is this still an issue as mentioned in this link: http://www.couchbase.org/forums/thread/understanding-membased-availabili....
6. Data loss during cluster warm-up phase - It was found that a newly configured Membase cluster needs sometime before it is ready to receive requests. During this warm-up phase requests were not successfully completed. I used to get a "PENDING" status in the Membase console. Sometimes, when a node in the cluster becomes unresponsive, then also I saw "PENDING" status with a Yellow status color, and during this time the Membase performance was not up to the expectation, and there was data loss.
Thanks
ashish
There is not much going on with the JUnit test. I simply created a mini framework in which I define the number of create and read threads. The create thread creates a unique number of key-value pairs. The number of unique key-value pairs is also configurable. This is because I wanted to create unique number of key-value pairs in Membase to get a better idea about the performance. The read thread reads objects from Membase, which you had created in the earlier run of the tests.
Now, when the JUnit test executes, it assigns a unique identifier to each create and read threads. The JUnit test keeps reference to all the threads it has spawned and every 1 minute checks if the thread is still active (refer isActive method of Thread class). This is done because multi-threaded testing is not supposed in JUnit. The create and read threads make use of Apache HttpClient to send GET/POST requests to the RESTful service. The drawback of this approach is that if a test fails, it's not reported....but I'm not interested in the test success or failure, I was only interested in generating enough load for the server. The Membase console can be used to verify if all the objects were successfully created or not by looking at the unique number of items in the Membase cluster.
If you feel that the testing code will still be helpful, I can post it here. It's not a full-fledged framework, but just a quick and dirty way to generate load for Membase.
regards
ashish
Heh, I only posted cause I was literally about to start writing load tests for Membase. Up to you, it would be nice, I would probably contribute to it by Friday
Hi Ashish, thanks for the detailed description!
I'll address your points one-by-one, will probably rope in someone from the spymemcached team as well:
1 - What version of spy were you using? We've had some issues in the past, but I thought they had all been resolved. Perhaps not...but we definitely want to get it all fixed.
2 - Stepping back a bit, there is a finite amount of RAM available on a system and the speed at which you can fill RAM is much higher than the speed with which you can drain it (to disk). As such, you are correct that there is no way to completely avoid OOM errors. However, I would argue (and am curious whether you agree) that providing the application this type of response is the "right" thing to do (as opposed to either slowing down, or completely blocking requests).
3 - This sort of stuff would concern us greatly if Membase was returning success to the client when it actually did not store the data. As far as we know, this cannot happen (and if it does, there's a bug which we need to fix asap). Can you confirm whether the requests are actually being made to Membase or getting lost somewhere higher in the stack?
4 - This a bit misleading since spymemcached in this mode thinks that it is a memcached client with multiple servers to try and "redistribute" to. You should repeat your test, and this time "failover" the node that goes down. You should see immediately that all data on that node is once again available. There's much more explanation that can go into this...let me know if you want to discuss further.
5 - I think this is somewhat related to the above, and more specifically related to the use of Moxi here. I would ask that you repeat this test with spymemcached in 'vbucket' mode (which is our recommended best practice anyway) and you should see much better behavior.
6 - Let's get a better term than "data loss". To me, "data loss" means that we told you the item was stored...and then at some point you can't get it back. In all of these situations, you should not have gotten a "success" for any request that was not available afterwards. Now, to the specific situation...yes, when a Membase server (doesn't matter whether it's in a cluster or not) starts up, it will begin loading its data from disk. The current software doesn't allow access to that data until the warmup is complete. We've got plans to improve this behavior with 2.0. "Pending" means that the node itself is up and accessible (from Erlang's perspective) but the memcached process itself (responsible for handling requests) is not responsive.
Thanks again for your details, I hope I was helpful and please let me know what needs further description or discussion.
Perry
In regards to your second point I wanted to mention that the issue you referenced http://www.couchbase.org/issues/browse/MB-4020 will probably never be fixed. The reason I filed it was due to a discussion I had with our chief architect. We decided that the server could definitely slow down the number of requests it processes as it gets closer to the hitting a temp oom error, but that this behavior is better handled by the client. On top of that is is much easier to implement on the client side than on the server side and adding exponential back off to spymemcached for this exact situation is a feature we will be adding soon. The bug I filed is more of a note that this is something we could add, but it is a very very low priority feature for the server.
Hi Perry,
Thanks for your quick response. Here are my responses:
1- I'm using spymemcached-2.7-2-gbd6e366.jar.
I saw that there issue with automatic failover in Membase 1.7, which was fixed in 1.7.1, as described in this post: http://www.couchbase.org/forums/thread/membase-server-171-available. Is there a possibility that this issue might be specific to Membase 1.7 version?
Also, there is a difference in behavior when clicking 'Failover' button Vs clicking 'Remove Server' button. When the 'Remove Server' button is clicked in Membase console then the whole cluster becomes unresponsive or the performance is degraded significantly. Is this still true in Membase 1.7.1?
2- I agree with you that sending OOM errors is better than slowing down the response. So, the recommended way is to catch OOM errors in the client and retry the request after a time delay? A bit more insight on how clients should deal with OOM errors would be really nice.
3- I didn't check success or failure of operations on the client side. It is possible that Apache HTTP Server or Apache Tomcat Server was getting overloaded with requests and the reason why some of the requests didn't complete successfully. I'm planning to introduce monitoring capability in the client code to figure out the success/failure statics. I'd be interested to know if someone has a similar deployment architecture and the configuration used for HTTP Server and Tomcat.
4- I had noticed difference in behavior when 'Failover' button is clicked and when the Moxi, Membase and Tomcat is shutdown on one of the instances. When 'Failover' is clicked, everything works fine. But shutting down a node definitely impacts the performance of membase drastically. Is this issue related to the automatic failover feature added to Membase 1.7.1.
5- Thanks for this recommendation, I'll once again try Membase 1.7.1 with Spymemcached Type 2 (with vBucket support) client.
6- I didn't put any error handling code on the client code, so I think this issue will be addressed by catching exception and performing retries in the client code.
Also, can you please give some pointers on the 'Crucial bug fixes around memory usage and lots of other improvements' that were done in Membase 1.7.1?
thanks
ashish
I just want to follow up on point number 2 here. Perry can provide better answers to some of your other questions than I can. With temp OOM there is currently no retry implemented in the client, but it is on our road map. What I suggest you do is check the status of operations after they are sent. The best thing for you to do is call the async functions in MemcachedClient and place the Futures that they return into a queue. Then you can periodically check this queue to make sure that all of the operations succeeded. You can do this by calling future.getStatus().isSuccess() and if you want to see the reason why an operation was or wasn't successful you can call future.getStatus().getMessage().
Another idea for checking status is to have one thread that is you application thread and does all of your normal application stuff and another thread that checks to make sure each request you did succeed. If a request fails you can simply resubmit it. In this model you would have a queue for your Future objects and the application thread would be the producer and the other thread would be the consumer.
Thanks Mike, this looks like a good approach to handle failures and perform retries.
1. vBucket support in Spymemcached: I tried to use Spymemcached client library as a Type 2 client (vBucket aware). During testing it was found that when a node in the cluster is brought down, the whole cluster becomes unresponsive: ( see the following link, which suggests that some folks did face the same issue http://code.google.com/p/spymemcached/issues/detail?id=181) Other related issues: http://code.google.com/p/spymemcached/issues/detail?id=136, http://code.google.com/p/spymemcached/issues/detail?id=108
We've run into this issue as well. We use a connection pool of spymemcached clients since the async and getbulk methods are broken and the client isn't thread safe in general. Since the client can't handle a node dying in the cluster we catch runtime/timeout exceptions and kill the client and check out a new one from the connection pool. It's a bad workaround but it works.
Dan, I'd like to challenge your assertion that the client is not threadsafe. The Membase server itself is what provides atomicity, and does so in a very thread-safe manner.
I'm not saying that there aren't bugs within the code, but I also know that we have been working very hard to resolve them.
I would ask that both you and Ashish take a look at the latest version (2.7.1) which came about from an extensive troubleshooting session between one of our customers and our primary engineer for this product.
Perry
Perry,
I am looking at the release notes for 2.7.1 and one of the resolved issues is a problem with multiple threads. We plan to check out 2.7.1 soon as it also addresses an issue I reported in April regarding bulk and async operations.
I definitely appreciate the work that is going into membase and spymemcached and don't mean to sound negative. I am just reporting on what we've seen.
Thanks,
Dan
Hi Ashish,
My name is Ping. I came across your post on this forum. I am wondering if you have got your problem solved and if you mind sharing your solution. I am running into the same issue as you posted here (#1). I have installed a cluster of two Membase nodes (membase-server-community_x86_64_1.7.1.1) on Linux. For client library, I am using spymemcached-2.7.3. I have deployed a simple restful web service to Tomcat through which I do some set and get. When I remove a node from the cluster and rebalance, I see random failures but when I shut down a node, the whole cluster becomes unresponsive. I got the following stacktrace:
MembaseTest - Timeout waiting for value
net.spy.memcached.OperationTimeoutException: Timeout waiting for value
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1185)
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1200)
at com.pearson.ltg.sso.MembaseTest.get(MembaseTest.java:43)
at sun.reflect.GeneratedMethodAccessor55.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
......
Caused by: net.spy.memcached.internal.CheckedOperationTimeoutException: Timed out waiting for operation - failing node: 10.3.199.236/10.3.199.236:11210
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:73)
at net.spy.memcached.internal.GetFuture.get(GetFuture.java:38)
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1178)
Any help will be much appreciated.
Ping
Hi Perry,
I saw your reply to another similar thread at http://www.couchbase.com/forums/thread/any-good-example-java-code-handle.... Our Membase client is constructed in a way like this:
String url = "host1:8091, host2:8091";
String[] uris = url.split(",");
List uriList = new ArrayList();
for (int i = 0; i < uris.length; i++) {
uris[i] = uris[i].trim();
uris[i] = "http://" + uris[i] + "/pools";
uriList.add(new URI(uris[i]));
}
MemcachedClient = new MemcachedClient(uriList, "default", "");
I don't have auto-failover enabled as I have only two nodes. As I use server-side Moxi, I understand that there is nothing I need to do whenever there is a topological change in the cluster as long as one node is running fine. But it doesn't look like this in my case. My Tomcat/web service Membase client is running in jdk1.6.0_06 while compiled on Windows using jdk1.6.0_13 and I use SoapUi to call the service. I don't think this is the cause though.
I'm planning to upgrade to the latest client library, spymemcached-2.8.0, but I am not supposed to see the error even with spymemcached-2.7.3, right?
Thanks a lot.
My name is Ping. I came across your post on this forum. I am wondering if you have got your problem solved and if you mind sharing your solution. I am running into the same issue as you posted here (#1). I have installed a cluster of two Membase nodes (membase-server-community_x86_64_1.7.1.1) on Linux. For client library, I am using spymemcached-2.7.3. I have deployed a simple restful web service to Tomcat through which I do some set and get. When I remove a node from the cluster and rebalance, I see random failures but when I shut down a node, the whole cluster becomes unresponsive. I got the following stacktrace:
MembaseTest - Timeout waiting for value
net.spy.memcached.OperationTimeoutException: Timeout waiting for value
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1185)
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1200)
at com.pearson.ltg.sso.MembaseTest.get(MembaseTest.java:43)
at sun.reflect.GeneratedMethodAccessor55.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
......
Caused by: net.spy.memcached.internal.CheckedOperationTimeoutException: Timed out waiting for operation - failing node: 10.3.199.236/10.3.199.236:11210
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:73)
at net.spy.memcached.internal.GetFuture.get(GetFuture.java:38)
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1178)
Any help will be much appreciated.
Ping
Very interesting. You should contribute your testing code