Thoughts about CBL and large datasets

I have a line of thinking that I’d like to put out there to solicit comment or feedback on things I may have overlooked/misunderstood.

I have a CB server with a Gateway and a bucket with some 700K docs in it. Presumably there will be a few such buckets. These buckets will need to get replicated to handheld devices.

I just managed to get my first sync running and it took over 30 mins to go from 0 to 700K docs. (I haven’t found a bulk replication API yet…) Further, there will be multiple applications in the handheld that will share the same CBL (so that we don’t have multiple copies of the same bucket on the same device).

Does it make sense to run a webserver on the handheld as a means of servicing requests from localhost apps? If so, I would imagine I could implement the methods by calling the Rest APIs on the sync gateway while the data is loading to the local CBL.

In other words for the first hour (or so) the handheld apps would be fully functional when connected to WiFi. And after the sync is complete, the handheld can revert to normal responses from the local CBL.

All of the handhelds at the same store location would sync to the same buckets/channels so after the initial sync things should be quick (assuming you didn’t switch stores which would require a delete and initial sync).

Does this make sense? Is this something others have tried?

Thanks,
Doug

Is this the time it takes to replicate from server to clients ? How are you measuring this ?

Not sure what “bulk replication API” you are looking for ? Sync Gateway imports documents continuously from server buckets subject to import filters and makes it available as soon as possible to any connected clients subject to access control restrictions.

Does it make sense to run a webserver on the handheld as a means of servicing requests from localhost apps? If so, I would imagine I could implement the methods by calling the Rest APIs on the sync gateway while the data is loading to the local CBL

You may want to read this blog for some patterns on sharing database across apps.
Also, what type of platform is your CBL running on?

That’s ~400 docs/sec, which is on the low side. How big are the docs? What kind of client device is this? An iOS device should be able to insert several thousand docs/sec, as long as the server can keep up.

Further, there will be multiple applications in the handheld that will share the same CBL

We don’t support simultaneous access to a database by multiple processes. For those most part it will work, but listeners in one process will not be notified when the database is changed by another process.

Does it make sense to run a webserver on the handheld as a means of servicing requests from localhost apps?

It’s slow compared to in-process database requests. And dealing with all the HTTP marshaling is annoying (depending on what OS or libraries you’re using.) It also opens you up to potential security holes if you’re not careful: you have to make sure you bind only to the loopback interface, and even then (if you’re not on iOS) other untrusted apps on the device could connect to your socket and access the database.

Thanks for the Blog reference. This is right down the path of the sorts of things I am pondering (prepopulating data, multiple apps sharing data, etc.)

Doug

The Docs have something like 20 attributes. Right now I am using the Android emulator, and the latest timing is taking about 26 minutes to do 880K documents. The emulator is running on a Mac connected by WIfi, through a private network to an Azure cloud instance that is running the Sync Gateway and Couchbase.

I think I will have to quantify how much of a hit you take when you use a headless local service for sharing vs. natively accessing a shared document.

As for the device, at least initially these devices should be locked down as to what apps will be on them. I suppose there might be a need for support of BYOD (Bring Your Own Device) where heightened security might be a concern.

I am told Android emulators are not very fast… but @blake.meike would know better.

Emulators were hella slow, 5 years ago. They have improved, steadily, over the years. Depending on what the host machine is, and which emulator you are using, you may see as much as a 2x improvement on a modern device.

Specifically, I have a Pixel II and an Nexus 4. The x86_64 emulator running on my Mac is somewhat faster then the Nexus, and significantly slower than the Pixel

So… assuming I were to try to use “in-process database requests”… I have two native Java apps in android. Do I need to do anything special to make them share the same CBL? Do I need to specify some special directory in the DatabaseConfiguration?

Or do I need to make one app be the “owner” of the CBL and have other apps talk to the owner via AIDL or some other such mechanism?

Just using a default DatabaseConfiguration and new Database with the same name does not seem to work. App1 creates and syncs with a gateway and works properly. App2 has no replication, it merely uses the default DatabaseConfiguration and creates a Database with the same name.

App2 appears to have an empty CBL.

Did you check the blog reference that I shared earlier ? It discusses implementation options for Android and caveats around sharing databases

Yep. you end up essentially creating a headless server app and using interapp communication. I’m looking at that now.

I just wanted to see if anyone figured out how to have two or more apps point to the same local file. (putting it in shared location or some such). … not a problem, just trying understand the options to help navigate to the best approach.

Doug

I suspect the two apps are not using the same database file. The default database configuration uses a directory local to the app, so two apps would end up with different directories. You’ll have to find a shared location both apps can access and point the config to point to it.

How to get that shared location is platform specific. I am not an Android developer, but I believe Android is more permissive about this than iOS, where it takes some special configuration to get a shared container.