Java client occasionally leaks (non-daemon) netty IO threads after shutdown

We're using CouchbaseClient v1.4 from a Java server with moderately high concurrency.
When we shutdown the Couchbase client, it usually comes down cleanly, but we occasionally see lingering Netty threads:

"New I/O  worker #1" prio=10 tid=0x00000000014a3000 nid=0x58bd runnable [0x00007fc7c2f88000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        - locked <0x000000059a590988> (a sun.nio.ch.Util$2)
        - locked <0x000000059a5909a0> (a java.util.Collections$UnmodifiableSet)
        - locked <0x000000059a5af320> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:52)
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:223)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
 
"Memcached IO over {MemcachedConnection to /10.6.70.3:11210 /10.6.70.2:11210 /10.6.70.1:11210}" prio=10 tid=0x00007fc7dcd48800 nid=0x58b8 runnable [0x00007fc7c38f1000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        - locked <0x000000059e331fe0> (a sun.nio.ch.Util$2)
        - locked <0x000000059e331ff8> (a java.util.Collections$UnmodifiableSet)
        - locked <0x000000059a513d90> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:222)
        at com.couchbase.client.CouchbaseMemcachedConnection.run(CouchbaseMemcachedConnection.java:158)

This prevents clean Java server shutdown, and requires an aggressive "kill -9" to clean up.
We figured out how to make the second thread start as 'daemon', but the "New I/O worker" thread can't be converted to daemon without gutting the client libs pretty heavily.

Has anyone else seen this? Any suggestions for workarounds?

@Dengberg, Do you have some example code that shows this behaviour? I tried to reproduce it but no joy so far.

I haven't been able to reproduce this in isolated unit tests. We (Evernote) run around 500 Java7+Tomcat servers that receive a lot of activity over the course of a week, and see this problem come up in the wild around 20% of the time when we try to shut down. So it's a bit hard to narrow down the exact conditions that cause the thread leakage.

Our current workaround:

--- /tmp/BucketMonitor.java	2014-05-16 10:04:58.000000000 -0700
+++ src/main/java/com/couchbase/client/vbucket/BucketMonitor.java	2014-05-16 10:04:16.000000000 -0700
@@ -97,13 +97,28 @@
     this.configParser = configParser;
     this.host = cometStreamURI.getHost();
     this.port = cometStreamURI.getPort() == -1 ? 80 : cometStreamURI.getPort();
-    factory = new NioClientSocketChannelFactory(Executors.newCachedThreadPool(),
-      Executors.newCachedThreadPool());
+    factory = new NioClientSocketChannelFactory(newThreadPool(),
+      newThreadPool());
     this.headers = new HttpMessageHeaders();
       this.provider = provider;
   }
 
   /**
+   * Creates an executor based on a simple thread pool that only
+   * uses 'daemon' threads.
+   */
+  private java.util.concurrent.Executor newThreadPool() {
+    return Executors.newCachedThreadPool(
+      new java.util.concurrent.ThreadFactory() {
+        public Thread newThread(Runnable r) {
+          Thread t = new Thread(r);
+          t.setDaemon(true);
+          return t;
+        }
+      });
+  }
+
+  /**
    * Take any action required when the monitor appears to be disconnected.

3 Answers

« Back to question.

Hi, there is a high chance that this is a bug. I'll go investigate - can you open a ticket here in the meantime? http://www.couchbase.com/issues/browse/JCBC

« Back to question.

I haven't been able to reproduce this in isolated unit tests. We (Evernote) run around 500 Java7+Tomcat servers that receive a lot of activity over the course of a week, and see this problem come up in the wild around 20% of the time when we try to shut down. So it's a bit hard to narrow down the exact conditions that cause the thread leakage.

Our current workaround:

--- /tmp/BucketMonitor.java	2014-05-16 10:04:58.000000000 -0700
+++ src/main/java/com/couchbase/client/vbucket/BucketMonitor.java	2014-05-16 10:04:16.000000000 -0700
@@ -97,13 +97,28 @@
     this.configParser = configParser;
     this.host = cometStreamURI.getHost();
     this.port = cometStreamURI.getPort() == -1 ? 80 : cometStreamURI.getPort();
-    factory = new NioClientSocketChannelFactory(Executors.newCachedThreadPool(),
-      Executors.newCachedThreadPool());
+    factory = new NioClientSocketChannelFactory(newThreadPool(),
+      newThreadPool());
     this.headers = new HttpMessageHeaders();
       this.provider = provider;
   }
 
   /**
+   * Creates an executor based on a simple thread pool that only
+   * uses 'daemon' threads.
+   */
+  private java.util.concurrent.Executor newThreadPool() {
+    return Executors.newCachedThreadPool(
+      new java.util.concurrent.ThreadFactory() {
+        public Thread newThread(Runnable r) {
+          Thread t = new Thread(r);
+          t.setDaemon(true);
+          return t;
+        }
+      });
+  }
+
+  /**
    * Take any action required when the monitor appears to be disconnected.
« Back to question.

Patch to force Couchbase to start NIO threads as 'daemons' to permit clean server shutdown:

--- /tmp/BucketMonitor.java	2014-05-16 10:04:58.000000000 -0700
+++ src/main/java/com/couchbase/client/vbucket/BucketMonitor.java	2014-05-16 10:04:16.000000000 -0700
@@ -97,13 +97,28 @@
     this.configParser = configParser;
     this.host = cometStreamURI.getHost();
     this.port = cometStreamURI.getPort() == -1 ? 80 : cometStreamURI.getPort();
-    factory = new NioClientSocketChannelFactory(Executors.newCachedThreadPool(),
-      Executors.newCachedThreadPool());
+    factory = new NioClientSocketChannelFactory(newThreadPool(),
+      newThreadPool());
     this.headers = new HttpMessageHeaders();
       this.provider = provider;
   }
 
   /**
+   * Creates an executor based on a simple thread pool that only
+   * uses 'daemon' threads.
+   */
+  private java.util.concurrent.Executor newThreadPool() {
+    return Executors.newCachedThreadPool(
+      new java.util.concurrent.ThreadFactory() {
+        public Thread newThread(Runnable r) {
+          Thread t = new Thread(r);
+          t.setDaemon(true);
+          return t;
+        }
+      });
+  }
+
+  /**
    * Take any action required when the monitor appears to be disconnected.
    */
   protected void notifyDisconnected() {