Replicator does not connect to endpoint when run in Docker container

I’m writing a Java app that will be responsible for doing some work and persisting that work to the database. The app is using CouchbaseLite 2.7.0 for Linux.

If I run the app from the IDE or run the executable directly from the command line, everything works fine. When I build the Docker image and run it inside a Docker container, however, it gets stuck at the point where the replicator attempts to connect to the endpoint, as can be seen by the following output:

I/CouchbaseLite/REPLICATOR:Replicator{@450b4ee,<-,Database{@247240e5, name=’[redacted]’},URLEndpoint{url=wss://[redacted]:443/data}]: status changed: (0, 0) @C4ReplicatorStatus{level=2, completed=0, total=0, #docs=0, domain=0, code=0, info=0}
I/CouchbaseLite/REPLICATOR:Replicator{@450b4ee,<-,Database{@247240e5, name=’[redacted]’},URLEndpoint{url=wss://[redacted]:443/data}] is connecting, progress 0/0, error: null

After this, nothing happens – no replicator status update or anything. And this is run with verbose logging turned on. Furthermore, HTTP requests work, because prior to starting up the replicator, the app creates a session with our cloud backend using an HTTP request, so the issue appears to be related to websockets.

We already have our Sync Gateway infrastructure set up and working with our other product lines. This app’s purpose is to perform some jobs as requested by the web app (or potentially other clients), persist that work to the database, and then close.

Is there any reason why the replicator can’t seem to connect from inside a Docker container?

CouchbaseLite is not aware of whether it is running inside a docker container or not. It seems likely that the problem has something to do with Docker, rather than something to do with Couchbase Lite.

In order to debug this, we are going to need a lot more than just two lines of log. Would you please set logging to DEBUG on ALL_DOMAINS, run your app, and then provide the complete record of a replicator’s connection attempt.

I agree that the problem is more likely to do with Docker than CouchbaseLite, but I was hoping that maybe others have encountered the same or similar issue, or might have some insight into what the solution might be on the Docker side?

Here is the full output I get when running the app in the Docker container with logging set to DEBUG on ALL_DOMAINS:
log

That’s literally all I get in the output. I’ve checked with my DevOps colleague and he is a bit stumped by it as well.

Ok. This one may be easy.

Notice this:
W/CouchbaseLite/REPLICATOR:Replicator{@36a5681a,<-,Database{@42a221bd, name='[redacted]'},URLEndpoint{url=wss://[redacted]:443/data}]: received unrecognized activity level:

This was a regression that appeared for a short period of time, I believe in CBL 2.8.0. In order to handle an infrequent corner-case, Core introduced a new replicator state, “STOPPING”. It, briefly, leaked that state. The platform code failed when it encountered it.

There are a couple of things that puzzle me:

  • This bug does not appear in any of the 2.7.x releases
  • The warning should include the ordinal for the unexpected state (which I would expect to be ‘6’) at the end of that warning. I do not know why it isn’t there.

What I can guarantee is that the replicator code crashes hard, on a IllegalStateException, after printing that message, in all of the 2.8.x versions.

It should be hard to drive that state. Perhaps there is something about the Docker environment that makes it more likely.

I believe that the 2.8.0 release is the only one that has the bug.

Ok, interesting. I did notice that warning before, but it appears when I run the executable from the command line outside of a Docker container as well, and everything works in that case.

However, I did upgrade to use the latest CouchbaseLite (2.8.3) in the app to see if that fixes it. Unfortunately it doesn’t, but I do get one bit of extra error output now (the warning you mentioned is gone): log

It’s failing to attach a thread to the Java VM. That might account for why I’m not getting any updates in the replicator change callback. Could this mean it’s not in fact a Websocket/Docker issue, but a threading issue? Weird how it only happens within the Docker container though.

That is a Core thread trying to send a logging message to the platform. I’ve never seen that happen before.

What version of Java are you using?

Also, the log clip that you reference shows no evidence of an error (the message about the the “unrecognized activity level” is gone). … though, admittedly, Core might be trying to tell you about an error, when it fails to send a logging message…

Again more log would be helpful.

That’s all the logging I could get; verbose and all domains, and everything I can get from Docker. The Java version running inside the Docker container is 1.8.0_212, however… the Dockerfile specifies the “Alpine” distribution of the OpenJDK. If I change this to just using OpenJDK 8, everything now works! So clearly there is something missing in the Alpine distribution of the JDK that CouchbaseLite is looking for.

Fortunately the issue was indeed on the Docker side as we suspected.
Thanks very much for the support and feedback!

Implementations of JNI vary. We need to sort this out, for OpenJDK. I’ve filed https://issues.couchbase.com/browse/CBL-1667 to track the issue.