CBL 2.x auto conflict resolution results in unresolved conflicts

I’m seeing cases where CBL 2.x auto conflict resolution leaves some document conflicts unresolved until another revision of the document is saved. This issue persists through app and replicator restarts. Based on this blog article : https://blog.couchbase.com/document-conflicts-couchbase-mobile/
Am I correct in understanding these conflicts should have been resolved automatically?

The iOS DocumentReplicationListener on a continuous push and pull replicator is returning the following error:

Domain=CouchbaseLite Code=10409 “conflicts with newer server revision” UserInfo={NSLocalizedDescription=conflicts with newer server revision}

Version Info:
CBL iOS 2.7.1
SGW 2.7.0

Documents are resolved on pull, not push. Any conflicts on push will just give up and move on, and the assumption is that a subsequent pull will resolve the same document.

If it’s a continuous replicator and there are no other changes to the database will the pull to resolve conflicts still occur?

The question itself is starting with the wrong basis. The reason a conflict happens, as you know, is because there are two different versions that are not in the same revision line (one local and one remote). In order to do the resolution you need to have both revisions available . Push and pull are independent and do not communicate with each other, so all push knows is “the server has a different version than me” and therefore cannot do anything. The pull realizes “this thing I just got is different than the thing that is already stored” and therefore has both versions. This happens regardless of replicator type (though there is another issue with one-shot where the resolved document is not pushed on the same run).

@borrrden We’ve also seen a lot of these conflicts with newer server revision errors and they tend to originate from a single user who created a new account and is only using one device. Meaning, there aren’t any other remote document changes it should be conflicting with made by some other device/user.

If Couchbase is handling concurrency correctly, how would this error otherwise be possible? Seems like a possible threading issue

We also see the same behavior @hyling described where the rejections won’t get persisted to remote because subsequent pulls don’t happen and every future push gets rejected.

All I can say from a high level is that if Sync Gateway returns a 409 that means that the revision submitted to it was not on the same tree as the version it has. There are probably a hundred different ways to cause that to happen. These 409 errors from Sync Gateway are not usually a problem for the replication protocol though so if you don’t notice any other bad behavior then ignore them.

Unfortunately the behavior we’re seeing is that new changes aren’t syncing for an indeterminate period of time and it’s not clear how they’ll resolve themselves; that effectively breaks cross-device sync.

I’ll likely be reaching out via the Enterprise support team once I dig in more since this issue becomes a release blocker for us if the behavior is indeterminate like this.

Sorry to hear that, but as you can well imagine the devil is in the details for this situation. The enterprise support staff should be able to get enough detailed information over to engineering if that is the route that gets taken. Probably far more effectively than the forums.

@borrrden @rob-keepsafe What I was trying to understand is if the 10409 errors should be possible if I’m using the default auto conflict resolution with a continuous push and pull replicator in CBL 2.x. I was trying to determine if I was expected to handle the conflict so can file a bug report. It’s now sounding like this not expected behavior. From my testing this appears to be a racing condition. The only way I’ve been able to reproduce it is if I start both CBL clients with the continuous replicator then disconnect and reconnect one of the clients from the network, even then it’s hard to reproduce. On average roughly 1/200 tries. If I use a one shot pull then one shot push replicator or started a continuous push and pull replicator after I’ve manually created a conflict condition on the server I still see the push replicator return the 10409 error but it’s later resolved in the pull replicator. However when the conflict error occurs because of a network interruption while the continuous replicator is running the conflict is not resolved until another revision of the document is saved on any client. The problem persists beyond replicator restarts and apps restarts which as Rob points out breaks sync.

The push and pull are racing each other on purpose so it depends on which one gets there first. The two expected scenarios are:

  1. Push is reached first, in which case you get a 10409 error which is then resolved later when the puller gets the conflicting revision pulled
  2. Pull is reached first, in which case you receive no 10409 because the pusher will push the result of the local conflict resolution instead.

To reiterate, just getting 10409 is not enough to file a bug report, this is expected behavior. If you are noticing some bad behavior otherwise that can be reported but unless you have a way to reproduce it then the journey to find a fix will be a long one.

I suspect what’s happening is scenario #1 but it appears the puller is not behaving as intended in some cases because it is not resolving the conflicting revision. I agree fixing bugs that are hard to reproduce is a nightmare but sometime more data is helpful. Besides the SGW logs and logs from the CBL clients is there anything else that would be helpful with getting a better understanding of this issue? Looks like someone else reported a similar problem through github issues in 2018? Sadly I can’t see the status because the issue was moved to enterprise customer support. Is that something you have access to? Could you tell us if that was resolved?

In the end that issue was resolved in version 2.5.0. Here are the tickets that came out of the 3 month investigation

Relevant PR: https://github.com/couchbase/couchbase-lite-core/pull/697/

1 Like

Hey @rob-keepsafe did you solve the problem?
I have the exact same problem. Even with just a single device I get 409 push errors and the document stops to sync. @borrrden said that a subsequent pull will resolve the conflict, but what about document that are mainly updated by device? In that case I may not get pull for that document for weeks so data on server is out of date!
Don’t know what to do.

@losapevo Sadly I’ve found no workarounds for this so far but I’m hoping to have more time to spend on this issue later this year. I’ll share my findings.