Query change listener inner workings

First of all, an important question to anyone with deep knowledge on the lite sdk (i am working with the swift one but i imagine they all work the same?):
Under the hood, does the Query.addChangeListener API just add a change listener to the collections referenced by the query and rerun the query when those collections receive changes? Is it more efficient than that?

I am trying to gauge whether going through the Collection.addChangeListener API and manually rerunning my queries when a change is emitted would be much slower than going through the provided Query.addChangeListener API.

— — Some Context — —
This is all in the context of managing multiple database connections to utilise read concurrency. When it comes to one time Query.execute() calls those i can manage fine and I think i found the best way to navigate that complexity in my scenario. My multi-connection implementation has drastically sped up our app taking it from unusable to almost up to par with our previous realm implementation that we are now migrating away from.

To express why the inner workings of the change listener apis are important to me, here is a little background on what i discovered so far through personal testing:

  1. Concurrent reads from multiple connections are fine, as long as they don’t read from the same collections. In other words, reading from scope.foo and scope.bar on two different connections performs both individual reads in the same time they would take if ran individually. Performing two reads into scope.foo concurrently, however, slows both reads down substantially.
  2. One database iterates and maps results in ResultSet faster if only iterating on one ResultSet at a time. This means that having dabatase1 try to concurrently unpack the ResultSet from query1 and query2 makes both iterators slower than having query2 wait for query1 to finish iterating over it’s own results before starting.
  3. Writes and key based lookups are fast enough that they warrant being on their own “always available” connection that does not need any manual managing. The lock that the sdk uses to manage interacting with a connection is more than enough for these.

Based on those findings here is a rough outline of how my multi-connection management works:

  • 1 connection for “direct access”:
    • handles all writes
    • handles all id based single document getters
  • a manager with multiple (3 atm) connections for query execution and unpacking:
    • has an operations queue, and a way o track busy collections and connections
    • when a query executes, it waits for both the collections it touches to be available (no connection reading from it) and for one of the connections to not do any work
    • before a query execution starts, both the collections it touches and the assigned connection are marked as blocked. This prevents other connections from slowing down the read time by reading the same collection and the chosen connection from slowing down the result set iteration time by working on anything else before it finishes with the query it was assigned
    • after the result is received, only the targeted collections are unblocked so that other connections + queries can read from them. The connection is kept busy still while trying to unpack the results.
    • after the results are handled, the connection is also made available again for another query to use

This leads to the crux of my issue: I have no control over what database will perform the query of a change listener, at the time a change comes in. I can only smartly manage the first execution that happens when the change listener is established. Because that always happens at call time, i can funnel it through the same management logic. Unfortunately subsequent fires will always happen on that initial database i assigned to it, based on availability criteria no longer valid. This means it can (and in practice does) trip up my fancy footworks around efficiently utilising multiple connections.

I have thoughts on how to approach this, but all my ideas fall short of the fairly solid management i have come up with for one time query execution as they all involve some level of compromise. And so i came to wonder whether query change listeners do more or more efficiently than just rerunning the queries on collection changes. Because if they don’t, then I can just rely on collection changes instead and decide which database should run what query myself, based on the state of my connections at that moment in time.

Apologies for the long writeup but figuring this out is both of exteme importance to me, and somewhat hard to explain. If anything is unclear, please ask any clarifying questions.

TLDR: Does the Query.addChangeListener API just add a change listener to the collections referenced by the query and rerun the query when those collections receive changes, or is it much more efficient that that?
If it is much more efficient, then i would like to make a feature request for pre query evaluation hooks in the Query.addChangeListener API. That would also solve my issues as i could mark the db and relevant collections as busy in the pre-hook, unblock the collections in the changes received callback and unblock the connection after the results are unpacked.

1 Like

Pretty much, yes.
The Query observer gets notified of database changes, then re-runs the query. It happens at a much lower level than the Swift API so I’m unsure of performance implications, but I’d be interested to hear your results.

FWIW: A query is ran at the database level, not collection. And the observer that is attached to the query is thread-safe as well.

If you have multiple queries and each target a specific collection or are isolated to a specific collection, then using collection change listeners for those collections might make sense. Depending on when the updates happen for each collection, then the total amount of queries (re-queries) might increase, which might translate in losing some performance.

@vlad.velicu What i think i am going to experiment with is to basically have a collection / database changes observer on the direct access connection (same db instance used for document key lookups and writes), and when that receives a change rebuild and run relevant queries on my worker connections (multiple databases used for concurrent reads) as they are made available.

Thank you both for the input. It seems like exploring this path is at the very least worth a shot. I will get something working next week and report back on my findings.

2 Likes