.NET Client Missing Records During Rebalance
Hello. I'm new to Couchbase and loving it so far, but I'm observing an unexpected behavior with the .NET client -- missing records and occasional null reference exceptions during a failover & rebalance. Perhaps someone can help me understand why this is happening.
I have a cluster of 5 Couchbase servers running 1.8.1 called "couchbase-1" through "couchbase-5" with a Couchbase bucket configured to store 2 copies of replication data. I'm using the .NET client and I adapted the CouchbaseSample with a simple test which stores a few records then loops through those keys repeatedly getting the records and reporting if they are ever unavailable. Here's the code:
//Manually configure CouchbaseClient
//May also use app/web.config section
var config = new CouchbaseClientConfiguration();
config.Bucket = "devbucket";
config.BucketPassword = "devpassword";
config.Urls.Add(new Uri("http://couchbase-1:8091/pools"));
config.DesignDocumentNameTransformer = new ProductionModeNameTransformer();
config.HttpClientFactory = new HammockHttpClientFactory();
//Quick test of Store/Get operations
var client = new CouchbaseClient(config);
for (int i = 1; i < 50; i++)
client.Store(StoreMode.Set, i.ToString(), String.Empty);
while (true)
for (int i = 1; i < 50; i++)
if (client.Get(i.ToString()) == null)
Console.Out.WriteLine("MISS: " + i.ToString());
While this is running I go into the Couchbase web admin and fail over one of the nodes. During the failover I usually see one or a handful of missing records then it stabalizes. Then I add back the node and rebalance. During the rebalance I see many missing records coming and going until the rebalance finishes. Occasionally, I see this null reference exception and the program crashes:
System.NullReferenceException: Object reference not set to an instance of an object.
at Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl.MarkAsDead() in C:\Documents and Settings\Developer\My Documents\couchbase-couchbase-net-client-b4b76af\lib\EnyimMemcached\Enyim.Caching\Memcached\MemcachedNode.cs:line 379
at Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl.InitPool() in C:\Documents and Settings\Developer\My Documents\couchbase-couchbase-net-client-b4b76af\lib\EnyimMemcached\Enyim.Caching\Memcached\MemcachedNode.cs:line 246
at Enyim.Caching.Memcached.MemcachedNode.Acquire() in C:\Documents and Settings\Developer\My Documents\couchbase-couchbase-net-client-b4b76af\lib\EnyimMemcached\Enyim.Caching\Memcached\MemcachedNode.cs:line 125
at Enyim.Caching.Memcached.MemcachedNode.ExecuteOperation(IOperation op) in C:\Documents and Settings\Developer\My Documents\couchbase-couchbase-net-client-b4b76af\lib\EnyimMemcached\Enyim.Caching\Memcached\MemcachedNode.cs:line 515
at Enyim.Caching.Memcached.MemcachedNode.Enyim.Caching.Memcached.IMemcachedNode.Execute(IOperation op) in C:\Documents and Settings\Developer\My Documents\couchbase-couchbase-net-client-b4b76af\lib\EnyimMemcached\Enyim.Caching\Memcached\MemcachedNode.cs:line 604
at Couchbase.CouchbaseClient.ExecuteWithRedirect(IMemcachedNode startNode, ISingleItemOperation op) in C:\Documents and Settings\Developer\My Documents\couchbase-couchbase-net-client-b4b76af\src\Couchbase\CouchbaseClient.cs:line 291
at Couchbase.CouchbaseClient.PerformTryGet(String key, UInt64& cas, Object& value) in C:\Documents and Settings\Developer\My Documents\couchbase-couchbase-net-client-b4b76af\src\Couchbase\CouchbaseClient.cs:line 113
at Enyim.Caching.MemcachedClient.TryGet(String key, Object& value) in C:\Documents and Settings\Developer\My Documents\couchbase-couchbase-net-client-b4b76af\lib\EnyimMemcached\Enyim.Caching\MemcachedClient.cs:line 150
at Enyim.Caching.MemcachedClient.Get(String key) in C:\Documents and Settings\Developer\My Documents\couchbase-couchbase-net-client-b4b76af\lib\EnyimMemcached\Enyim.Caching\MemcachedClient.cs:line 125
at CouchbaseSample.Program.Main(String[] args) in C:\Documents and Settings\Developer\My Documents\couchbase-couchbase-net-client-b4b76af\src\CouchbaseSample\Program.cs:line 45
Am I correct in thinking this is unusual behavior? Can anyone provide adivce on how to code or configure this differently to achieve the high-availability I am expecting?
Thanks,
Ilsja
Ilsja,
There's a couple of things you can do here. The current versions of the .NET client library have a new API of methods that start with the word Execute*() (for example, ExecuteStore() instead of just Store()). These methods return an actual result object that lets you inspect the result code for success or failure and identify why a failure happened if needed. So that API can help diagnose problems, and also you can handle the errors appropriately in the client. For example, if you hit a timeout error, you can identify that, and retry the operation with a backoff, or bubble up the error info to the end user ("Retrying in 2 [4, 8, ...] seconds") as appropriate. Info on the Execute* methods at:
http://www.couchbase.com/docs/couchbase-sdk-net-1.1/couchbase-sdk-net-rn...
Also the individual Execute* methods are documented in the main API reference there.
Additional information can be extracted from the internal logging of the client. You can use nlog or log4net adapters that are provided as part of the Couchbase .NET client library, which will log detailed info on what is happening under the hood. Instructions for setting up this logging are at:
http://www.couchbase.com/docs/couchbase-sdk-net-1.1/couchbase-sdk-net-lo...
Finally, I am aware of some issues w/ the current 1.1.6 client that have been recently uncovered that sound similar to your use case. We have a 1.2 release that is in the works that should have some fixes for client issues during a rebalance. Certainly hitting a NPE during rebalance is not acceptable. The other errors may be timeouts (in particular if you've got more data than RAM, a rebalance may be causing more requests to hit disk, and could be causing timeouts); should be clearer from the logging output and the Execute* result codes.
Regards,
Tim