Version depth and sync meta data

How do we manage the version depth on CBL? How can we manage the _sync data growth on the CBserver/sync GW side?

On the client, set the Database.maxRevTreeDepth property. (The default is 20, I think.)

On SG, define a revs_limit property in the JSON config object for a database. (The default is 1000.)

Don’t set the server value too low, or clients will start getting false conflicts if they’re offline for a while and then log in again. Of course “too low” depends on your data model and how often docs change and clients connect, so I can’t really quote a specific number. But I’m sure 1000 is conservative so you could lower it a lot in most cases.

Also, remember that the rev tree just stores the revision IDs. The actual JSON bodies of the revisions get cleaned up soon after a new revision is added.

Thanks Jens for the info.

  1. That takes care of version depth at the database level. Do we have any at document level? What is the best practice if for some docs we only need to keep the latest version but for others we need more for conflict management?
  2. For a document, the _sync block for the document keeps track of all the access, channels, history etc. Is there any way to limit that on the server side? We have some documents that can be updated quite frequently by the user and are struggling to manage the meta data growth in such cases. Any advise?
  3. What is the max and average metadata overhead per revision for a document?
  1. Sorry, it’s not configurable per document.
  2. revs_limit, as I said, will limit that.
  3. I don’t know exactly. You could do some experiments and look at the size of the raw document in the bucket, before and after making changes.

Thank you. I will try these.

I tried the revs_limit query but its not working. I am using CB 4.0 beta and Sync GW 1.1. Any idea what I am doing wrong. Pls advise.

[root@guest5 Sync-Gw]# curl -X GET http://localhost:4985/testdata/_revs_limit

{“error”:“not_found”,“reason”:“unknown
URL”}

[root@guest5 Sync-Gw]# curl -X GET --trace-ascii sync_trace.out -H “Accept:
application/json” “http://172.29.21.189:4985/testdata/_revs_limit

{“error”:“not_found”,“reason”:“unknown
URL”}

It’s not a query or part of the REST API. As I said, it’s property in the JSON config object for a database. You’ll have to edit the config file to change it.

Thanks Jens, thats working except that the “recent_sequences” keeps on growing and has no effect. Is this the expected behaviour?

Thx.

I don’t know about the recent_sequences property … looks like that was added after I stopped working on that area of the code. Maybe @adamf or @andy know?

The recent_sequences property should be limited to a maximum of 20 - are you seeing it grow beyond that value?

we set the revs_limit to 5 to test and recent_sequences is not taking that value to consideration. Is the “20” fixed?

Correct - recent_sequences is unrelated to revs_limit. recent_sequences helps Sync Gateway account for multiple sequences for the same doc ID that have been deduplicated by the Couchbase Server feed - it’s unrelated to revision tree management.

Since it’s only storing the sequence number (not the larger revision history) in recent_sequences, there’s no fine-tuning of the size - the limit of 20 is fixed. Let me know if it’s causing problems for you, though - we could consider additional tuning.

thanks. We will try to work around it but does it makes more sense to include this on the same config property setting or have this configurable?