Couchbase Lite Xamarin thousand of replication errors - Start Replication - System.IndexOutOfRangeException: Index was outside the bounds of the array

Hello,
On Xamarin Couchbase Lite Nuget v 2.7.1 we are swamped (over approx 1000 per day!) by errors of this kind:

Replicator+<>c__DisplayClass55_0.<StartInternal>b__0 ()
System.IndexOutOfRangeException: Index was outside the bounds of the array


System.Collections.Generic
HashSet`1[T].SetCapacity (System.Int32 newSize)
System.Collections.Generic
HashSet`1[T].IncreaseCapacity ()
System.Collections.Generic
HashSet`1[T].AddIfNotPresent (T value)
System.Collections.Generic.HashSet`1[T].System.Collections.Generic
ICollection<T>.Add (T item)
Couchbase.Lite.Sync
Replicator+<>c__DisplayClass55_0.<StartInternal>b__0 ()
Couchbase.Lite.Support
ThreadSafety.DoLocked (System.Action a)
Couchbase.Lite.Sync
Replicator.StartInternal ()
Couchbase.Lite.Sync
Replicator.<Start>b__35_0 ()
Couchbase.Lite.Support
SerialQueue.DispatchSync (System.Action a)
Couchbase.Lite.Sync
Replicator.Start ()
My.App.Droid.Services
CouchbaseImplementation.StartReplication ()
My.App.Droid.Services
CouchbaseImplementation.WaitAndRestartSync ()

Note: StartReplication() simply starts a replication, while WaitAndRestartSync() is responsible for handling the scenario when a replication stopped due to off-line, and it’s responsible of re-starting it.

The count of errors is 10,866 (!) over a population of approx 15 devices (!!) hence I am trying to understand what is going on.

On the same piece of code, we also have hundreds of this error:

# FLSliceExtensions.FLEncode[TVal] (System.Collections.Generic.IDictionary`2[TKey,TValue] dict, LiteCore.Interop.FLEncoder* enc)
System.InvalidOperationException: Collection was modified; enumeration operation may not execute.

stack trace:

System.Collections.Generic
Dictionary`2+Enumerator[TKey,TValue].MoveNext ()
LiteCore.Interop
FLSliceExtensions.FLEncode[TVal] (System.Collections.Generic.IDictionary`2[TKey,TValue] dict, LiteCore.Interop.FLEncoder* enc)
LiteCore.Interop
FLSliceExtensions.FLEncode (System.Object obj, LiteCore.Interop.FLEncoder* enc)
LiteCore.Interop
FLSliceExtensions.FLEncode (System.Object obj)
LiteCore.Interop.ReplicatorParameters
LiteCore.Interop.ReplicatorParameters..ctor (System.Collections.Generic.IDictionary`2[TKey,TValue] options) [0x00009] in <b1eafb08e4244435bdb17f2e5c03537e>:0
Couchbase.Lite.Sync
Replicator.StartInternal ()
Couchbase.Lite.Sync
Replicator.<Start>b__35_0 ()
Couchbase.Lite.Support
SerialQueue.DispatchSync (System.Action a)
Couchbase.Lite.Sync
Replicator.Start ()

Android OS: 8.0.0 and 9.0.0
Built on latest Xamarin msbuild
Cb lite Xamarin 2.7.1
App built for armeabiv7a and arm64

I looked through similar topics on the forums but there are none on this specific issue.

thanks for your help

System.InvalidOperationException: Collection was modified; enumeration operation may not execute.

This sounds like a threading issue and it’s weird that it happens while encoding the ReplicatorParameters.
@Sandy_Chuang Can you provide more info about how this could happen?

Hi @gmaggini If you set Continuous in ReplicatorConfiguration, you should not manually calling Replicator Start when the ReplicatorActivityLevel is Offline. The Reachability will kick in and detect if there is network connection available and replication will re-try to connect and resume if the available network connection is detected.
-Sandy

Hi, we had to implement a “wait and restart sync” because the application was running behind VPN and the reachability was successful (the device was connected) but it did not report “offline” it got back a “404” type of error as could not find the server.

As per the documentation, the CB lite stopped retrying in that case:

" When a permanent error occurs (i.e., 404 : not found, 401 : unauthorized), the replicator (continuous or one-shot) will stop permanently."
(https://docs.couchbase.com/couchbase-lite/current/csharp.html#replication)

That sounds like incorrect behavior by the VPN. It should just be routing packets, not pretending to be a host it isn’t. What’s the exact error?

Good catch. Bad explanation from my side, apologies. What I meant is this scenario:

  1. The user opens the VPN client and connects to VPN
  2. The app starts and connects to Sync GW successfully, starts replication
  3. The device goes to an area without connectivity, VPN drops
  4. The device goes back to an area with internet connectivity, Internet connectivity resumes
  5. The VPN does not come back automatically for whatever reason
  6. The app now reports that is reachable, but due to VPN not having (yet?) reconnected it returns 404, hence continuous replication stops

Just as an update: right now trying to address the error by not re-creating the replicator and instead restarting it. Keeping under observation

This still seems wrong. If the VPN isn’t up, then either:

  • the DNS lookup of a hostname behind the VPN will fail, resulting in an “invalid hostname” error; or
  • the lookup will succeed but connecting to that IP address will fail with an IP error like “no route to host” or “connection refused”.

You shouldn’t get an HTTP error like 404. The only reason I could think of for that to happen is that the hostname resolves to a different IP address inside and outside the VPN. But that seems pretty strange to me.