Trying to run multiple replicators in parellel to speed up the download resulting in "database is locked" error every now and then

In our usecase we have around 3 millions db rows to download which takes around 2 hours which is a bit unacceptable in our usecase. So to speed up the process we decided to run Multiple replicators in parellel but that leads to about 10% of the inserts failing with the following message.

{N8litecore4repl8InserterE#42} Failed to insert ‘781346ec-60a7-4edb-b4fc-ff9baxe7f68c’ #1-x345ascf8f3f4e3c3e3cf97fe1b20493abd5e : LiteCore Busy, “database busy/locked”

Looks like when a thread is writing to the DB then other threads are unable to write. But the thread which failed to write instead of either waiting or reattempting the insertion, it just moves on to the next row

I don’t think that is going to work. The database is a file on the file system. It cannot be updated by multiple threads at the same time.

Let me see if I can find an expert to suggest alternatives…

Unfortunately I’ve also run into this problem using just one replicator during long running replications.
It’s rare but when it happens clients that have completed replication and you expect to have fully updated documents instead have randomly outdated documents. The problem is that the replicator never retries failed documents when this happens. I would expect it to retry on the next replication attempt. Currently the only way I’ve found to force it to retry a document is to create and save another revision of the failed document on the server.

@hyling If you have encountered either a locked database during replication or randomly updated documents, please report a bug. Neither one of those things should happen, ever.

I do have another thread that writes to the database while replication is happening. I assumed it was the same problem you described above, multiple threads can’t write to the database file at the same time?

Yeah, in that case, you are right.

Unlike large DB systems, SQLite is just a complicated way of writing to a file. The OS won’t let multiple threads do that at the same time…

@blake.meike ,i didn’t find any workaround yet. Would it be possible for the replicator to reattempt the failed insertions or may be wait for the lock to be released before attempting insertion? also, want to add that our iOS app with couchbaselite library is using same strategy of running multiple replicators. but in there case database is not getting locked

Replicators do, in fact, queue insertions until the the lock is free.

I think the issue, here, is the mis-perception that, somehow, running multiple replicators will speed up a transaction. Replicators are already quite smart about how they exchange data, using information about the specifics of implementation that client code cannot possibly access.

Thanks for the reply. We have actually found speed to be 3-4x in case of ios while running 5 replicators in parellel. We were hoping for the same in android. I do have one suggestion as i have read in one of the forums quite recently that threads in replicators wait for 10 seconds for the lock to be release before giving up with the insertion fail error. If it’s true then can we increase this wait time of 10 seconds or make it configurable?

@chauhanabhi321 – I am aware of an issue in Mobile that might be throttling replicator throughput in some situations, in which case you would see higher performance with multiple replicators in parallel. There’s a fix in progress but no ETA.

The lock-timeout error is something we’ll definitely look into. That shouldn’t be happening. SQLite is limited to a single write transaction at a time, but CBL usually arbitrates that pretty well across multiple threads; it sounds like there’s a situation with the replicator where that isn’t happening and SQLite’s cruder busy-wait kicks in.

However, running multiple replicators with the same configuration is not allowed and won’t work reliably – it’s quite possible you’ll end up with documents being skipped, especially if any errors cause the replicator to retry.

(The reason is that each unique replicator configuration (direction, channels, docIDs…) stores its state in a “checkpoint” document on both client and server. Two replicators with the same config will write to the same checkpoint and overwrite each other’s state.)

BTW, what versions of CBL and SG are you using?

1 Like

Thanks for the detailed explanation @jens . Currently, we are using version 3.0.0 with only PULL machanism.

@jens , would like to add done more point. We are not exactly running multiple replicators on same config. We are providng different set channels to each replicators so no two replicator would try to download from the same channel. Also, we are feeding a replicator one channel at a time. Once that replicator downloads everything from a channel then we feed next channel to that replicator.

That should be ok; each replicator has a distinct config and their checkpoints won’t collide.

As for the time to download: Have you considered shipping a database file inside your app with the latest version of the data? This can then be copied into the app’s mutable storage as the original database. The first sync then only has to download whatever’s changed since that database was captured.

Another approach, if you want the app to remain small or if the data set changes too rapidly, is to have a server-side process that periodically creates a fresh CBL database and pulls from the server, then zips that and puts it where clients can download it over HTTP. Your app can then GET that on first launch, unzip, and use the copy-database API to install it as its new db.

@jens , thanks for the above mentioned solutions. But, first solution would not be possible in our case because we have wide range of sets of huge databases and which one to download is dependent on who is the logged in user. Second solution looks interesting but it would be efficient for us to have multiple replicator run and do the job from the front end itself. Anyway, iOS team in my company is already running multiple replicators without any problem. It would be a big help if database locked issue could be resolved at the couchbase library level itself.

I tested this again. This was actually just fixed in version 3.1.0 for iOS. In my tests it was still an issue in 2.8.4, 3.0.0, 3.0.1, and 3.0.2. Cheers to everyone involved in this fix! :raised_hands:t3: :tada: :beers:

1 Like

@hyling , please have it fixed in the android too

It sounds like you’re using android 3.0.0, have you tried it with 3.1.0?