The actual process of eviction is relatively simple now. When we need memory, we look around in hash tables and attempt to find things we can get rid of (i.e. things that are persisted on disk) and start dropping it. We will also eject data as soon as it's persisted iff it's for an inactive (e.g. replica) vbucket if we're above our low watermark for memory. If we have plenty of memory, we'll keep it loaded.
The bulk of this page is about what happens when we encounter values that are not resident.
In the current flow, a get request against a given key will first fetch the value from the hash table. For any given item we know about, there will definitely be a key and its respective metadata will always be available in the hash table. In the case of an "ejected" record, the value will be missing, effectively pointed to NULL. This is useful for larger objects, but not particularly efficient for small objects. This is being addressed in future versions.
When fetching a value, we will first look in the hash table. If we don't find it, we don't have it. MISS.
If we do have it and it's resident, we return it. HIT.
If we have it and it's not resident, we schedule a background fetch and let the dispatcher pull the object from the DB and reattach it to the stored value in memory. The connection is then placed into a blocking state so the client will wait until the item has returned from slower storage.
The background fetch happens at some point in the future via an asynchronous job dispatcher.
When the job runs, the item is returned from disk and then the in-memory item is pulled and iff it is still not resident, will have the value set with the result of the disk fetch.*
Once the process is complete, whether the item was reattached from the disk value or not, the connection is reawakened so the core server will replay the request from the beginning.
It's possible (though very unlikely) for another eject to occur before this process runs in which case the entire fetch process will begin again. The client has no particular action to take after the get request until the server is able to satisfy it.
An item may be resident after a background fetch either in the case of another background fetch for the same key having completed prior to this one or another client has modified the value since we looked in memory. In either case, we assume the disk value is older and will discard it.
Concurrent reads and writes are sometimes possible under the right conditions. When these conditions are met, reads are executed by a new dispatcher that exists solely for read-only database requests, otherwise, the read-write dispatcher is used.
The underlying storage layer reports the level of concurrency it supports at startup time (specifically, post init-script evaluation). For stock SQLite, concurrent reads are allowed if both the journal-mode is WAL and read_uncommitted is enabled.
Future storage mechanisms may allow for concurrent execution under different conditions and will indicate this by reporting their level of concurrency differently.
The concurrentDB engine parameter allows the user to disable concurrent DB access even when the DB reports it's possible.
The possible concurrency levels are reported via the ep_store_max_concurrency ep_store_max_readers and, ep_store_max_readwrite stats. The dispatcher stats will show the read-only dispatcher when it's available.
New data is better than old data, so a set always wins. Similarly, a delete always wins. Increment, decrement, add, etc are all atomic, but you can imagine them working as a get + store.