Restart a replication when stopped

Good morning,
reading the documentation on https://docs.couchbase.com/couchbase-lite/2.7/java-android.html#replication

When a permanent error occurs (i.e., 404 : not found, 401 : unauthorized), the replicator (continuous or one-shot) will stop permanently.

The first question is: when an error is considered “permanent”? I fully understand the 404 and 401 errors but are there other cases?

In second step, to deal with these two cases (never happens because the replicator is always configured correctly) but also other permanent errors, I put this on the code.

//continuous replicator
this.replicator.addChangeListener(new ReplicatorChangeListener() {
                @Override
                public void changed(@NonNull ReplicatorChange change) {
                    log_i("REPLICATOR:" + change.getStatus());
                    if (change.getStatus().getActivityLevel() 
                             == AbstractReplicator.ActivityLevel.STOPPED) {
                        replicator.start();
                    }
                }
            });

I am wondering if it is a good approach the restart of a replicator or if it’s better to create a new one.
The main issue creating a new one is to deal with all the listeners attached to it, but please let me know what is your best practice.

Thanks for any feedback. Have a nice day.

Internally there’s a list of error codes that are considered ‘permanent’, or at least ‘not transient’, that stop the replicator.

I am wondering if it is a good approach the restart of a replicator

Technically this will work. But it’s not the right solution. Instead you should notify the user of the problem, and give them the choice whether to retry. Otherwise you can end up trying over and over and over and over, while the poor user wonders why their data isn’t updating. (When the answer is that their password was entered wrong…)

The whole point of the ‘transient’ vs. ‘permanent’ errors is whether the error is likely to recur. If we decided the error was worth stopping the replicator, that means it’s unlikely to fix itself simply by trying again and again.

Thanks Jens,
it sounds reasonable. Our configuration is straightforward, endpoint and credentials are hard coded. So I am not wondering about the setup, but I understand the point.

I am asking because I was testing the replicator with an unstable network, which means turn on/off the wifi in order to make SG reachable and unreachable quickly. It happened only once in a day of tests that the replicator has been stopped.

If you say the code works properly I will put all the log of the case :slight_smile: and I will try this approach… Thanks for the feedback.

The overall point here is that permanent errors have a reasonable doubt of recovering on their own, and to retry has a good chance of being futile. That doesn’t make it impossible, which is why we give the final word back to the consuming code (in the form of the callback with stopped and error) so that, as with compiler warnings, you can decide to override the decision and start again. It’s actually a list of predefined transient error code that we store. If it doesn’t match any of those, it’s considered permanent. You can see a partial list here in the handling network errors section (though 1001 seems to be mislabeled). However, if you suspect that the error condition in your case is recoverable (we don’t have any sane way to know that on the library side) you can restart the replicator immediately, or after a reasonable delay.

:+1: perfect. Thanks for your support.

I really hope you’re not hardcoding credentials in a production app :scream:

We are still on dev, that’s why I am not worried at the moment.
Thanks guys. Have a nice day