Restart a replication when stopped

astentv · September 15, 2020, 9:56am

Good morning,
reading the documentation on https://docs.couchbase.com/couchbase-lite/2.7/java-android.html#replication

When a permanent error occurs (i.e., 404 : not found, 401 : unauthorized), the replicator (continuous or one-shot) will stop permanently.

The first question is: when an error is considered “permanent”? I fully understand the 404 and 401 errors but are there other cases?

In second step, to deal with these two cases (never happens because the replicator is always configured correctly) but also other permanent errors, I put this on the code.

//continuous replicator
this.replicator.addChangeListener(new ReplicatorChangeListener() {
                @Override
                public void changed(@NonNull ReplicatorChange change) {
                    log_i("REPLICATOR:" + change.getStatus());
                    if (change.getStatus().getActivityLevel() 
                             == AbstractReplicator.ActivityLevel.STOPPED) {
                        replicator.start();
                    }
                }
            });

I am wondering if it is a good approach the restart of a replicator or if it’s better to create a new one.
The main issue creating a new one is to deal with all the listeners attached to it, but please let me know what is your best practice.

Thanks for any feedback. Have a nice day.

jens · September 15, 2020, 4:54pm

Internally there’s a list of error codes that are considered ‘permanent’, or at least ‘not transient’, that stop the replicator.

I am wondering if it is a good approach the restart of a replicator

Technically this will work. But it’s not the right solution. Instead you should notify the user of the problem, and give them the choice whether to retry. Otherwise you can end up trying over and over and over and over, while the poor user wonders why their data isn’t updating. (When the answer is that their password was entered wrong…)

The whole point of the ‘transient’ vs. ‘permanent’ errors is whether the error is likely to recur. If we decided the error was worth stopping the replicator, that means it’s unlikely to fix itself simply by trying again and again.

astentv · September 16, 2020, 7:29am

Thanks Jens,
it sounds reasonable. Our configuration is straightforward, endpoint and credentials are hard coded. So I am not wondering about the setup, but I understand the point.

I am asking because I was testing the replicator with an unstable network, which means turn on/off the wifi in order to make SG reachable and unreachable quickly. It happened only once in a day of tests that the replicator has been stopped.

If you say the code works properly I will put all the log of the case and I will try this approach… Thanks for the feedback.

borrrden · September 16, 2020, 11:50am

The overall point here is that permanent errors have a reasonable doubt of recovering on their own, and to retry has a good chance of being futile. That doesn’t make it impossible, which is why we give the final word back to the consuming code (in the form of the callback with stopped and error) so that, as with compiler warnings, you can decide to override the decision and start again. It’s actually a list of predefined transient error code that we store. If it doesn’t match any of those, it’s considered permanent. You can see a partial list here in the handling network errors section (though 1001 seems to be mislabeled). However, if you suspect that the error condition in your case is recoverable (we don’t have any sane way to know that on the library side) you can restart the replicator immediately, or after a reasonable delay.

astentv · September 16, 2020, 2:24pm

perfect. Thanks for your support.

jens · September 16, 2020, 6:46pm

I really hope you’re not hardcoding credentials in a production app

astentv · September 18, 2020, 9:16am

We are still on dev, that’s why I am not worried at the moment.
Thanks guys. Have a nice day

Topic		Replies	Views
CBL2.0 Replication Usage Couchbase Lite	7	1033	January 2, 2018
Continuous puller replicator stops after server error Mobile	4	932	February 14, 2018
[Continuous replication] Offline vs Stopped Couchbase Lite dot-net	2	852	June 20, 2018
Android Change tracker stopped during continuous replication Mobile	3	2010	April 29, 2016
Correct way to restart a Replication? Couchbase Lite	3	2803	March 3, 2016

Restart a replication when stopped

Related topics