Couchbase ignores half of my Store requests not telling me why?

I'm using the newest servers (download yesterday) and the freshest Nuget package for the .NET API lib.
It simply won't store half of the requests, and not telling me why. No exception. Nothing suspicious in the logs. It seems like it's random behaviour. I know that one of the servers is on its knees due to heavy resource usage, but that's the intention to see how things work and replicate.

This is the code I use:

		public void Store(string key, object data)
		{
			var dataString = data.ToString();
			var result = _couch.ExecuteStore(Enyim.Caching.Memcached.StoreMode.Set, key, dataString, TimeSpan.FromMinutes(10));
 
			if (!result.Success)
				throw result.Exception;
		}

I wrote an .NET MVC app to help me test these things. I added stuff by going through keys as numbers, in this case 1 through 10, and some random typed text as a value. Half of them gets stored. Some of the tried keys, like 5 for example, refuse to get stored to the cache, no matter how many times i retry to store something in it.

I also notice that it takes time for me to be able to access the cached stuff. I guess that's because of the async nature of Couchbase? It may take up to 5 seconds before I can retrive something that was successfully stored.

1 Answer

« Back to question.

For sure something is not happening as expected. Successful writes will be readable immediately afterwards. Whilst persistence is asynchronous, that does not affect data read availability.

Views are dependant on data being persisted before they appear in a result though, so if you have an overloaded server and are using views, that may explain why you don't see results you expect (plus you can control index update behaviour for your queries, read http://www.couchbase.com/docs/couchbase-manual-2.1.0/couchbase-views-wri... for more details).

In order to find out why writes are failing it is recommended to use the newer .NET API that provides more than the simple boolean response, so that you can understand the cause of the error.

http://www.couchbase.com/docs/couchbase-sdk-net-1.2/couchbase-sdk-net-op... has the details on how to get those detailed operation results with .NET. Once you have themthey should shed some light on what is going on in your environment.

Hello Frank!

Thank you dearly for the quick and helpful response.
I am using the latest package, and by investigating the result object (and the Exception property) that get returned i see nothing out of the ordinary.

That explains why the view is lagging a bit, because i called the Store method and immediately after I called a Get against the view to populate some data in a web page.

At any rate - when watching for data storage operations, i investigate in the Couchbase web console > in the bucket-documents and also in a view I created to see the lagging sync (i access the view through a GET request vs the API).
It still seems random if the Store operation actually succeeds or fails. If the Store operation fails, there is no exception, and I follow the pattern as described in the link you provided.
It is also like this: If a Store operation failed for a given key, then no further attempts to store anything with this given key will succeed.

One of the servers is at like 95% mem usage and rather high CPU usage. This machine is 32 bit.
I use two very different virtual machines in a cluster. The stronger machine is 64 bit. Both have 2 kernels (lower than recommended) and also lower memory than recommended.
Even though the machines are of the weaker kind, I think the system should work together.

My theory is that when storing there is some kind of algorithm that do this - one of the machines fail to do so (probably the weaker one) and that the data then is lost. But there is nothing in the log to support this theory.

However! - i noticed just now that there is an issue for rebalancing. In the log there is a report that rebalancing failed. This is what the log says. (Can this be the reason for this behaviour?):

Rebalance exited with reason {unexpected_exit,
 {'EXIT',<0.17907.2>,
 {badmatch,
 [{'EXIT',
 {{badmatch,{error,closed}},
 {gen_server,call,
 [<0.17906.2>,had_backfill,30000]}}}]}}}

A completely fresh installation of the Couchbase on two almost identical servers fixed the problem.