Sync Gateway only seems to be sending data to 3 devices at a time

I have a community edition Sync Gateway (2.8) pointing to a community edition Couchbase Server (6.6) on the same local network. Just one node of each on different Ubuntu servers.
Recently 9 million json documents have been added to the database (on top of 7.5million) and we have 20 iOS devices (mostly iPad Pro 2nd gen, iOS 14.3~) with community Obj-C Couchbase Lite 2.7.1.

The issue we appear to be having, is that we have a handful of devices (connecting externally) syncing with the Sync Gateway, and the others seem to be stuck and not gotten any data since the 21st January 2021 (when the 9 million document were loaded), despite having the status of “Syncing” on the device. I have tried restarting sync gateway and reopening the app.

Any data they create on their iPad is pushed up to our server fine, they just don’t seem to be pulling any data.

Attached below is the sync gateway config. You’ll notice I have use_views as true, as I found this to be faster but this was done prior to change https://issues.couchbase.com/browse/CBG-821 so maybe indexes are ok now. (taken the password out obviously)
sync_gateway.json.zip (1.5 KB)

The user is salestool, so therefore connects to the salestool channel
startReplicationFunction.txt.zip (1018 Bytes)

There’s also a process subscribed to changes feed through websocket
https://192.168.1.11:4984/db/_changes?feed=websocket
{
perMessageDeflate: false,
agentOptions: {
rejectUnauthorized: false
}

Couchbase server has 24GB RAM and 8 CPU cores

Couchbase sync gateway has 16GB RAM (only using 500MB though) and 4 CPU cores. Below is the response from /db
couchbase syncgateway db.json.zip (709 Bytes)

Let me know if you need anything else, I’ll see if I can get any logs off the iPad itself.

Our plan is to use docker to create 3 Couchbase Server nodes and 2 Sync Gateway nodes as recommended, but this will take time to setup and test at the moment.

Annoyingly my iPad started to sync fine when I tried to get the logs, however running netstat only showed 2 ips connected to port 4984 (however I believe the iPads are still open trying to replicate), but that might explain why mine started to sync.
I have attached the replicator startup logs anyway, and maybe on Monday I can see if the replicator gets stuck (end of working day where I am now).

The version I have attached is also using Couchbase Lite 2.8.0 as that is what I have ready for our next live release (I’m hoping this issue isn’t down to using 2.7.1 Lite with 2.8.Sync Gateway, I am sure mine got stuck even with 2.8 when trying this before)

verbose replicator logs from iPad.log.zip (1.9 KB)

I think the info we’d need here to try and diganose are the logs from the device during (or after) a stuck state.

As a bonus would be grabbing the Sync Gateway logs for the same timeframe, but from the sounds of it, the pull replicator is getting into a weird state, and may not even be connecting (a PushAndPull replicator under the hood are two separate/indepedent replicators).

Unfortunately (or fortunately) I have been unable to get the logs of the replication getting stuck, as miraculously all is fine now.

I think changing my iPad CB Lite version back and forth from 2.7.1 to 2.8 on my iPad restarted the sync entirely from 0, this then possibly resolved the other iPads getting stuck?

I thought it might have been an issue with the server config or sync gateway config only allowing a certain amount at at time, but I guess it was something else.

If it happens again, I’ll try and remember to check the device logs first.