Unexpected restart of couchbase buckets
Mon, 12/03/2012 - 05:36
Hello.
We are using Couchbase 1.8.1 community edition (build-937).
We have 1 node and 4 couchbase buckets configured on our test environment.
Our java application is deployed on the same server as Couchbase and uses couchbase-client v1.0.3.
There are some suspicious messages in our logs:
[2012-12-03 09:18:51,165] INFO - com.couchbase.client.CouchbaseConnection - Reconnecting due to exception on {QA sa=db1.ra1.iv/10.30.243.147:11210, #Rops=1, #Wops=0, #iq=0, topRop=Cmd: 0 Opaque: 30728 Key: sprofiles_user-profile-updates, topWop=null, toWrite=0, interested=1}
java.io.IOException: Disconnected unexpected, will reconnect.
at net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:452) [spymemcached-2.8.2.jar:2.8.2]
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:380) [spymemcached-2.8.2.jar:2.8.2]
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:242) [spymemcached-2.8.2.jar:2.8.2]
at com.couchbase.client.CouchbaseConnection.run(CouchbaseConnection.java:230) [couchbase-client-1.0.3.jar:1.0.3]
[2012-12-03 09:18:51,165] WARN - com.couchbase.client.CouchbaseConnection - Closing, and reopening {QA sa=db1.ra1.iv/10.30.243.147:11210, #Rops=1, #Wops=0, #iq=0, topRop=Cmd: 0 Opaque: 30728 Key: sprofiles_user-profile-updates, topWop=null, toWrite=0, interested=1}, attempt 0.
[2012-12-03 09:18:51,165] WARN - n.s.m.p.binary.BinaryMemcachedNodeImpl - Discarding partially completed op: Cmd: 0 Opaque: 30728 Key: sprofiles_user-profile-updates
[2012-12-03 09:18:51,165] ERROR - c.r.l.client2.continuous.SnapshotHandler - Unhandled exception:
java.lang.RuntimeException: Exception waiting for value
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1001) ~[spymemcached-2.8.2.jar:2.8.2]
at com.radiumone.udb.membase.MembaseWrapper.getBytes(MembaseWrapper.java:55) ~[udb-core-1.23.0-rel-764.jar:na]
at com.radiumone.udb.updater.service.MembaseSnapshotsService.loadState(MembaseSnapshotsService.java:125) ~[udb-updater-1.23.0-rel-764.jar:na]
at com.radiumone.udb.updater.service.MembaseSnapshotsService.saveSnapshot(MembaseSnapshotsService.java:67) ~[udb-updater-1.23.0-rel-764.jar:na]
at com.radiumone.udb.updater.service.MembaseSnapshotsService.saveSnapshot(MembaseSnapshotsService.java:56) ~[udb-updater-1.23.0-rel-764.jar:na]
at com.radiumone.log.client2.continuous.SnapshotHandler.messagesReceived(SnapshotHandler.java:58) ~[log-core-1.23.0-rel-764.jar:na]
at com.radiumone.log.client2.LogServerClient2$3.run(LogServerClient2.java:146) [log-core-1.23.0-rel-764.jar:na]
at java.lang.Thread.run(Thread.java:662) [na:1.6.0_33]
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancelled
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:167) ~[spymemcached-2.8.2.jar:2.8.2]
at net.spy.memcached.internal.GetFuture.get(GetFuture.java:62) ~[spymemcached-2.8.2.jar:2.8.2]
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:997) ~[spymemcached-2.8.2.jar:2.8.2]
... 7 common frames omitted
Caused by: java.lang.RuntimeException: Cancelled
... 10 common frames omitted
[2012-12-03 09:18:51,186] INFO - .u.s.p.ProfileUpdateMultiThreadedService - Flushed in 0 millis
[2012-12-03 09:18:51,186] WARN - com.couchbase.client.CouchbaseConnection - Node exepcted to receive data is inactive. This could be due to a failure within the cluster. Will check for updated configuration. Key without a configured node is: sopt-in_opt-in-requests.Then there is:
[2012-12-03 09:20:07,947] WARN - net.spy.memcached.auth.AuthThread - Authentication failed to db1.ra1.iv/10.30.243.147:11210 [2012-12-03 09:20:07,973] INFO - net.spy.memcached.auth.AuthThread - Authenticated to db1.ra1.iv/10.30.243.147:11210 [2012-12-03 09:20:07,975] INFO - net.spy.memcached.auth.AuthThread - Authenticated to db1.ra1.iv/10.30.243.147:11210
and everything is ok for a while.
Looking at the log in the couchbase admin console we can find following failures:
Server error during processing: ["web request failed",
{path,"/pools/default/saslBucketsStreaming"},
{type,exit},
{what,
{timeout,{gen_server,call,[ns_config,get]}}},
{trace,
[{diag_handler,diagnosing_timeouts,1},
{menelaus_web_buckets,
'-handle_sasl_buckets_streaming/2-fun-2-',
2},
{menelaus_web,streaming_inner,3},
{menelaus_web,handle_streaming,4},
{menelaus_web,loop,3},
{mochiweb_http,headers,5},
{proc_lib,init_p_do_apply,3}]}]
Server error during processing: ["web request failed",
{path,"/pools/default/bucketsStreaming/udb2"},
{type,exit},
{what,
{timeout,{gen_server,call,[ns_config,get]}}},
{trace,
[{diag_handler,diagnosing_timeouts,1},
{ns_bucket,get_bucket,1},
{menelaus_web_buckets,
'-handle_bucket_info_streaming/3-fun-0-',
4},
{menelaus_web,streaming_inner,3},
{menelaus_web,handle_streaming,4},
{menelaus_web_buckets,
checking_bucket_access,4},
{menelaus_web,loop,3},
{mochiweb_http,headers,5}]}]
Shutting down bucket "udb1" on 'ns_1@127.0.0.1' for server shutdown
Shutting down bucket "udb2" on 'ns_1@127.0.0.1' for server shutdown
Shutting down bucket "udb3" on 'ns_1@127.0.0.1' for server shutdown
Shutting down bucket "udb4" on 'ns_1@127.0.0.1' for server shutdown
I'm the only node, so I'm the master.
Couchbase Server has started on web port 8091 on node 'ns_1@127.0.0.1'.
Bucket "udb3" loaded on node 'ns_1@127.0.0.1' in 0 seconds.
Bucket "udb4" loaded on node 'ns_1@127.0.0.1' in 0 seconds.
Bucket "udb1" loaded on node 'ns_1@127.0.0.1' in 0 seconds.
Bucket "udb2" loaded on node 'ns_1@127.0.0.1' in 0 seconds.
Bucket "udb1" loaded on node 'ns_1@127.0.0.1' in 0 seconds. (repeated 14 times)
Bucket "udb3" loaded on node 'ns_1@127.0.0.1' in 0 seconds. (repeated 14 times)
Bucket "udb4" loaded on node 'ns_1@127.0.0.1' in 0 seconds. (repeated 14 times)This situation is repeated from time to time during the day (2-4 times a day).
Also the following message is appeared in logs sometimes:
Control connection to memcached on 'ns_1@127.0.0.1' disconnected: {badmatch,
{error,
etimedout}} (repeated 1 times)Could you help us to understand what happens there? What possible reasons are there? Is it a critical issue?
P.S. Our application works without problems on another server.
Can you upload your logs to S3? You can find instructions to do this here:
http://www.couchbase.com/wiki/display/couchbase/Working+with+the+Couchba...
Just leave the name you used for the files and I will take a look.