Conflicts for _removed documents?

We found out today that it is possible to have conflicts with _removed documents, and I’m running low on options for dealing with this scenario.

Consider the following sequence of events:

  1. Add document locally via CBLite. Ensure it pushes to the server.
  2. Go offline.
  3. Save more revisions locally.
  4. On the server (sync gateway), remove the user’s channel from the document.
  5. Make another server side change (add a property “foo”)
  6. Go back online on the device, let it pull down the changes (mainly the _removed doc)
  7. Couchbase Lite chooses the latest offline revision as the winner, and is in conflict with the _removed revision.

At this point, I have two options: use the _removed document as the winner (which CBLite did not choose as the current revision), or use my document as the winner.

The former option forces me to delete any local changes that may have been made to the document and accept the _removal. The latter results in overwriting the document on the server, which is sort of crazy. Even though my local document has no awareness of the additional modifications made in step 5 (because it can’t pull down the changes due to no channel access), the sync gateway will still accept my local, out of date revision as the “winner” and it overwrites what used to be the current revision without allowing me (CBLite) an opportunity to resolve any conflicts.

Even if I do an open_revs=all against the sync gateway, the server document that contained the property “foo” is gone.

Any guidance at all would be appreciated as to how to deal with removals and conflicts.

I think the key thing that is missing here is the fact that read access and write access are controlled in two different places. Losing read access will not lose write access, as you’ve discovered. You should be able to enforce the lack of write access as well as part of your sync function though. There is one part that confuses me though:

What you described is what I would expect if your local revision history had a higher generation than the remote one (in a conflict, the longer revision history will be the winner by default) so the fact that your “winner” changes remotely is not surprising by itself, however it should be in conflict at that point. If you’ve enabled no conflicts mode in SG 2.0, however, then that’s another story and your only recourse is to reject the write as part of your sync function.

Indeed. Our sync function is rather robust in terms of authorization and ensuring that the user making a change is actually allowed to make that change (based on roles, channel access, etc). But, this case is slightly different since all authorization checks succeed, yet the document is accepted by the sync gateway and no one has an opportunity to resolve the conflict.

This implies that when channels are removed that a separate list of authorized users (or something) must be maintained in the document to handle this scenario so that the sync function can accept/reject the edit. But, it would only be useful in this scenario where a removal is in conflict with a local change. When no conflicts exist, the removal is received on the client and the document goes away, so there is no possibility to modify a removed document, and no need to handle a special case in the sync function.

That is correct. The local revision has a higher generation than the _removed document, but it is in direct conflict with other revisions that already exist on the server (the server doc kept making updates). The crazy thing is that after the local document is pushed to the server, open_revs does return conflicts (which the client cannot resolve). However, none of the conflicts contain the previous server revision prior to the client push… so the data is effectively gone, and no one was ever able to resolve it. It’s like the sync gateway just accepted whatever the client sent as its “latest” and the previous revision content is gone.

I have tests that re-create the above scenario, so I can give more details and examples.

Is there any kind of “rule of thumb” to abide by here? Such as, if a removal is one of the conflicts that the removal always wins? Clearly someone is taking your access away, and updating the server document after the fact with a local document causes worse issues (in this case, data loss).

The document is accepted by sync gateway if your sync function allows it to be. The sync function can reject incoming revisions if necessary. There is no authorized users list for write, it is simply based on not being rejected by the sync function.

I was suggesting that it also has a higher generation than the server edits, in which case when it is accepted by the sync function, it becomes a winning conflict on the server side.

They have been superseded by new revisions, both by the client and directly via the server. The previous revision content is not gone per se, it is just not a leaf anymore. I imagine if you query sync gateway and pass the revision ID as a query parameter that you will find it is still intact, unless it has been compacted and/or purged by Couchbase Server. You won’t be able to resolve these conflicts on the client side unless read access is granted again, and thereby you will receive the server edits on the client side. You can resolve them via the Sync Gateway API if you wish though.

Again, this is read access that is lost. Write access is not considered on the client side other than “try to write, and see if it is rejected or not.” But you seem to have some logic for determining read access, so why not apply that for write access as well? If the criteria for read access is not met, you can reject the incoming revision.

I spent some time walking through the scenario, and these are the steps:

  1. Create doc on client, save 2 revisions, push to server, go offline
  2. Server doc and client doc are at revision 2
  3. Make 2 more changes on client while offline. Client rev is now 4, server is still rev 2
  4. Fetch server doc through SG. Remove channels. Server rev is now 3
  5. Modify server doc 2 more times. Server rev is now 5, client rev is 4
  6. Bring client back online
  7. _removed revision 3 is pulled from server, is in conflict with revision 4 created locally
  8. Server revs 4 and 5 are not replicated (no channel access)
  9. CBLite chooses client revision 4 as winner (3 is the removal and shorter rev tree)
  10. Delete rev 3 (_removed doc)
  11. Client rev 5 is created now (no conflicts remain according to CBLite)
  12. Recall that server rev is also at 5 from step 5 above
  13. CBLite pulls down the server rev 5 as another _removed document (not sure why)
  14. Now the server rev 5 and client rev 5 are in conflict
  15. This time, CBLite chose the client rev 5 (the non _removed one) as the winner (must have sorted higher)
  16. CBLite deletes the server rev 5 (_removed doc) and creates rev 6 on the client as the new revision
  17. Client pushes up revision 6 and the tombstone of the server rev 5 (which used to be the current revision on the server)
  18. Now the current rev is 6, which does not contain any server changes made in rev 5

Clearly, the server rev 5 that is being sent down again as a removal and then deleted is causing the local doc to be the winning revision again, and the server doc was effectively deleted by conflict resolution on the client.

So… I guess the lesson here is… don’t delete _removed documents as part of conflict resolution? If that is the case… how do you actually resolve the conflict then? And also figure out why I am getting the removed document twice for different revisions?

I’m not sure why the removed document is coming through twice, unless access was accidentally added and then removed again. That is a mystery that I’d have to see in action. The rest sounds more or less like the correct flow to me.

This is probably a good idea. The thing about these removed revisions are that they share the same revision ID as the real document and so a modification on them will be treated the same as a modification to the server revision if pushed. Modifying data that you don’t have read access to can lead to trouble. I would suggest that you leave the conflict in place until you get access back again (at which point you could then resolve it with all the proper information). If you must resolve it, then the only recourse would be to delete the other branch with the local modifications. This is a rather detailed situation and so this is an untested answer but working things out logically in my mind is where this leads me. Official support channels should be able to give a more in depth analysis.