I am seeing thousands of warnings like the one below each day in my sync gateway logs (v2.7):
2020-10-28T21:48:40.781Z [WRN] c:[42db87f4] MultiChangesFeed got error reading changes feed "all_stores": Changes feed terminated while waiting for view lock -- db.(*Database).SimpleMultiChangesFeed.func1() at changes.go:548
I would like to understand the impact of this lock contention and timeout. Is this impacting performance replicating to mobile devices? Could it cause the connection to the client to get dropped, because I also happen to see a lot of ‘connection reset by peer’ errors on the mobile devices? How do I correct this problem?
This indicates that a client has disconnected while it was waiting for the view lock. This isn’t causing the client to disconnect, it’s more likely that the client is disconnecting due to timeout while waiting.
This particular lock (“view lock”) restricts concurrent channel queries to one per channel. The motivation is that if many clients are all trying to query and replicate the same channel, it’s generally preferable to have one client perform the query, publish results to the cache, and have the remaining clients served from the cache. There are a few things that may cause this to not be optimized:
if the size of the channel cache is smaller than the query result set, each client will end up needing to re-execute the query. Increasing the cache size would help here.
if the cache size is large but the channel/query is also large, the waiting clients may be timing out before the first query results are returned. In this case adjusting the client timeout would be an option.
Thanks for the info, @adamf. Looking at the Sync Gateway stats from /_expvar, I see a 1.8% cache miss rate for the channels cache. This doesn’t seem large, but given that these lock timeouts are occurring at a rate of about 1 per sec, I think we do have a problem. We will experiment with increasing the cache max_length, which is currently 5,000, although some of our channels have over 1 million documents.
I saw that the revs cache has a 94% cache miss rate. It is currently set to the default of 5,000. I may try experimenting with increasing that as well. Does this lock timeout warning apply only to the channel cache, or could inefficiency in the revs cache also affect it?
The view lock is specifically related the channel cache (and serving changes requests) - the rev cache hit/miss ratio would be unrelated here.
Another consideration, particularly when you’ve got large channels (more than 1M documents), is the expected usage of those channels. If clients are regularly going to be pulling the entire, non-cache-resident channel from zero, then some query contention is going to occur. A common way to avoid this is to seed client databases in the case of large, semi-static channels, so that clients are only pulling recent changes, which can be resident in the cache.
However, there is also an enhancement in Sync Gateway 2.8.0 - adding pagination when querying very large channels - that may help for your use case. https://issues.couchbase.com/browse/CBG-821 has additional detail.
Thanks again, good info. We are looking into seeding with a prebuilt DB but have encountered some issues with that which are covered in a support ticket. We hope to get that working soon. I’ll look into pagination.