Under normal operation the cluster should be in a “stable” state and the configuration shouldn't change. If your application is using short lived connections or a lot of connections from the same machine you are constantly requesting the server to give you the current configuration of the server. These are HTTP streaming requests which ns_server isn’t implementing very efficiently (slow to set up and requires a fair amount of resources in ns_server). As an example it takes 18 seconds to connect 1000 clients on the loop back interface on my quad-core AMD machine with 4 GB memory.
The idea is to use a local cache in the file system on the clients so that the clients don’t talk to ns_server unless there is an actual change in the cluster topology.
New instances is created by using lcb_create_compat with type set to LCB_CACHED_CONFIG:
lcb_create_compat(LCB_CACHED_CONFIG, &specific, &instance, io);
Specific is a new structure that looks like:
The cachefile is the path (relative or absolute) of the file containing the cache. If set to a non-null value the lockfile specifies the file to use to synchronize access update access to the the cachefile (we don’t want all instances to try to upgrade the cache at the same time). If no lockfile is specified “.lock” is appended to cachefile.
Given that the cache is intended to be accessed from multiple processes at the same time the user is responsible for deleting the cachefile and the lockfile when it is no longer in use.
We need to extend the internal instance metadata that so that we know that if we get a “not my vbucket” response we know that we’re in a “cached configuration” change and that we should use full bootstrap logic.
To avoid a “burst” of clients updating the cache when the topology change the clients first try to create a lockfile. If that fails because of a file existence it the client will try to check the age of the lockfile (to work around stale locks). If the lockfile is older than 2secs it will go ahead and try to update the configuration anyway (and remove the lockfile when its done). In this situation you get a burst of connect attempts anyway.
The entire JSON for the current configuration is dumped in the cache file.
If the lock file exists and is “new” the client will “busy”-wait (just a really short sleep) and check for the existence for the file (unless the platform supports monitoring a file).