smart client sometimes hangs indefinitely
We're running into cases where calls to ClientManager.getClient() are sometimes hanging indefinitely. Note that getClient() on one smart client may be called from multiple threads. Any ideas?
Thanks, Amy
"bd8352f5:flus:29" daemon prio=10 tid=0x00000000408f5800 nid=0x2487 waiting on condition [0x00007f60e6bc6000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007f60f50406c8> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
at com.membase.store.util.BucketUpdateResponseHandler.getReceivedFuture(BucketUpdateResponseHandler.java:139)
at com.membase.store.util.BucketUpdateResponseHandler.getLastResponse(BucketUpdateResponseHandler.java:117)
at com.membase.store.util.BucketMonitor.getServerList(BucketMonitor.java:93)
at com.membase.store.ClientManager.getBucketMembers(ClientManager.java:360)
at com.membase.store.ClientManager.getClient(ClientManager.java:458)
"bd8352f5:flus:31" daemon prio=10 tid=0x000000004083a800 nid=0x2489 waiting on condition [0x00007f60e69c4000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007f60f5040c60> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
at com.membase.store.util.BucketUpdateResponseHandler.getReceivedFuture(BucketUpdateResponseHandler.java:139)
at com.membase.store.util.BucketUpdateResponseHandler.getLastResponse(BucketUpdateResponseHandler.java:117)
at com.membase.store.util.BucketMonitor.getServerList(BucketMonitor.java:93)
at com.membase.store.ClientManager.getBucketMembers(ClientManager.java:360)
at com.membase.store.ClientManager.getClient(ClientManager.java:458)
"bd8352f5:flus:28" daemon prio=10 tid=0x00000000407f8000 nid=0x2486 waiting on condition [0x00007f60e6cc7000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007f60f50360b0> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
at com.membase.store.util.BucketUpdateResponseHandler.getReceivedFuture(BucketUpdateResponseHandler.java:139)
at com.membase.store.util.BucketUpdateResponseHandler.getLastResponse(BucketUpdateResponseHandler.java:117)
at com.membase.store.util.BucketMonitor.getServerList(BucketMonitor.java:93)
at com.membase.store.ClientManager.getBucketMembers(ClientManager.java:360)
at com.membase.store.ClientManager.getClient(ClientManager.java:458)
"Memcached IO over {MemcachedConnection to /10.102.59.192:11211 /10.102.59.220:11211}" prio=10 tid=0x0000000040750000 nid=0x2495 runnable [0x00007f60e5db9000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x00007f60f521fde0> (a sun.nio.ch.Util$1)
- locked <0x00007f60f521fdc8> (a java.util.Collections$UnmodifiableSet)
- locked <0x00007f60f520c908> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:188)
at net.spy.memcached.MemcachedClient.run(MemcachedClient.java:1591)
"Memcached IO over {MemcachedConnection to /10.102.59.192:11211 /10.102.59.220:11211}" prio=10 tid=0x00000000407d5800 nid=0x2490 runnable [0x00007f60e62be000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x00007f60f51c5500> (a sun.nio.ch.Util$1)
- locked <0x00007f60f51c54e8> (a java.util.Collections$UnmodifiableSet)
- locked <0x00007f60f516a7f0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:188)
at net.spy.memcached.MemcachedClient.run(MemcachedClient.java:1591)
yes, this only seems to happen sporadically, even though i don't touch the bucket configuration. restarting the northscale memcached server seems to help.
That's a clue. Does the log show anything unusual in the way of connections?
Just to give you some background, the smart client maintains a connection to the cluster to receive updates. If an update is received or the connection is interrupted, it can block getClient() calls for a bit of time.
Are you initializing the client with more than one server? How are you initializing the ClientManager?
Thanks,
- Matt
There should be no problem with multiple threads. It looks like they're all waiting on a countdown latch, which probably means the client configuration hasn't completed yet. Are you sure the bucket you're trying to connect to exists?