Attachment replication

Hello,
Config:
CouchbaseLite version 2.1.0
SyncGateway 2.0.0

When I saved attachment on couchbaselite document, I got error:

CouchbaseLiteException{domain=‘POSIXErrorDomain’, code=104, msg=Connection reset by peer}

SyncGateway logs:

2019-03-21T16:47:04.161Z HTTP: [6e2b1e97] #1171: → BLIP+WebSocket connection error: Invalid checksum f9b6201e; should be e13d356e
2019-03-21T16:47:04.161Z HTTP+: [6e2b1e97] #1171: → BLIP+WebSocket connection closed

This is a pretty low-level error that indicates some kind of networking problem — it might be a bug in our network code, or it might be data corruption going over the net.

Is it reproducible? How many times has it happened?

The bug appear for every attachment sync from a mobile.

I found that our reverse proxy Traefik is in cause, attachment sync work perfeclty if the mobile is directly connected to SG.

Is attachment sync by post request ? The checksum calculated take into account the http header request ? I think Traefik probably modify header.

@jean-maxime have you found a solution to this Traefik problem? I’m facing the very same problem. I didn’t want to lose the advantages of using Traefik.

@jens,

Is there any custom, non standard, header that is sent during the attachment transmission that might need to be added by Traefik while forwarding the packets back and forth the container?

Because I keep asking myself…

If this is was a general problem with Traefik, even document replication wouldn’t work at all. But document replication works. Only attachment transmission fails.

In the mean time I have found this in Traefik configuration:

https://docs.traefik.io/configuration/backends/docker/#custom-headers

If there are custom headers exchanged back and forth, it will not work correctly.

What do guys think about this?

Regards,

Nuno

HTTP headers only apply during the initial WebSocket handshake. After that, the TCP connection’s protocol changes from HTTP to WebSockets, which don’t have anything like headers.

Worth noting though that someone reported a problem with Azure rejecting WebSocket messages longer than 4KB. We will send up to 16KB messages if the resource being transferred is larger than 4K; this tends to happen more with blobs than with documents.

@jens

Yes, I’m aware of that issue. I have tried what was described as a workaround (reducing the max packet size to 4096) but it didn’t work anyway. Something else is happening. I’m trying to dig it out.

How can I increase the logLevel of BLIP? I have tried to change line 54 and 55 of BLIPConnection.cc to the following:

LogDomain BLIPLog("BLIP", LogLevel::Debug);
static LogDomain BLIPMessagesLog("BLIPMessages", LogLevel::Debug);

But it didn’t seem to increase the log level. Do I need to fully recompile?

What do I know so far?

a) direct connection to the Sync Gateway without Traefik doesn’t cause the web socket to close
b) Traefik doesn’t log anything when this happens (seems like nothing wrong happened)
c) Sync Gateway states ERROR decompressing frame: inputLen=4090, remaining=0, output=0, error=unexpected EOF when this happens, connection is terminated.

My theory is that Traefik is somehow cutting the packet and when it gets to the validation phase on Sync Gateway, it gets rejected and the connection is closed.

In order to talk with Traefik developers and say ‘Hey! There is a problem with your software’, I need to have more data to fundament my claim. Developers usually don’ t believe something is wrong with their software unless we are able to prove them wrong with another program. Isn’t that right? :wink:

Sync Gateway says that the inputLen is 4090 bytes and there is a unexpected EOF. I would like to track what is being sent from the client side to Sync Gateway and know the length of the packet that is being sent just before the close happens.

Can you please advise on how to increase the log level of couchbase-lite-core?

Thanks!

Best regards,

Nuno

Your best bet here is a packet capture of the network traffic, and analyse it with Wireshark.

These are pretty standard tools which hopefully the Traefik devs can understand without delving into any Couchbase details.