How do you get or calculate the size of a couchbase lite document using Swift

I am using couchbase lite on the iPhone using Swift 3.0. I have found some JSON examples to get the size in bytes, but these do not work with a CBLDocument in Swift. I have also thrown together a somewhat terrible way of accomplishing what I need. Please show me there is better way to get the size of a CBLDocument in bytes.

func getDocumentSize(_ doc : CBLDocument) -> UInt64
{
var size = 0

for x in (doc.properties)!
{
    size += x.key.lengthOfBytes(using: .utf8)

    if ((x.value as AnyObject).isKind(of: NSString.self))
    {
        size += (x.value as! String).lengthOfBytes(using: .utf8)
    }
    else if ((x.value as AnyObject).isKind(of: NSNumber.self))
    {
        size += MemoryLayout<NSNumber>.size
    }
    else if ((x.value as AnyObject).isKind(of: NSArray.self))
    {
        size += (x.value as! NSArray).count * MemoryLayout<NSArray>.size
    }
}

return UInt64(size)

}

We don’t have a method for that, and there’s no way to get the raw JSON stored in the database. (In fact there’s no guarantee we are storing JSON in the database; we do in 1.x, but the 2.0 in development uses a more compact binary format. It’s an implementation detail.)

You could use something like what you’re doing, except that

  • For numbers you need to work out the number of decimal digits. The memory size occupied by an NSNumber has nothing to do with the written length of the number in ASCII.
  • For arrays, again, looking at NSArray won’t tell you anything. Instead you need to recurse over the elements in the array, adding up their encoded sizes.
  • Same for dictionaries, but add up the lengths of the keys as well as the values.
  • Don’t forget to take into account the two quote characters around strings, the commas between array/dictionary elements, the colon after a dictionary key, and the braces/brackets around arrays and dictionaries.

Can you explain what you need this for?

Oh, I forgot the obvious solution: you could just call NSJSONSerialization to convert the document’s properties into JSON, and get the length of the returned NSData object.

I swear I tried this before and it didn’t work. Just tried it again and its working great! Here is the code (Swift 3.0) for any future reference.

/******************************************************************************/
func getDocumentSize(_ doc : CBLDocument) → UInt64
{
var size = 0

    do
    {
        let json_doc = try JSONSerialization.data(withJSONObject: doc.properties!)
        
        size = json_doc.count

    }
    catch
    {
        print(error.localizedDescription)
    }
    
    return UInt64(size)
}

Don’t you mean json_doc.length?

In Swift 3.0, the “Data” structure returned by JSONSerialization.data uses count instead of length. Swift likes to be different :wink:

public struct Data : ...
{
   ...

    /// The number of bytes in the data.
    public var count: Int

   ...
}

Some CBL documents in my iOS and Android app will exceed 20MB limit sooner or later. so I need to handle this somehow. The solution I have in mind is to check CBLDocument size when trying to add content, if size has reached 20MB, create another document. So I have the same question for solution in both Swift and Java. I noticed the last message in this thread was Feb 2017, I was wondering what’s the answer to the original question in November 2019 (e.g. does newer version CBLite offer some convenient approach for this)?

Thanks !

Honestly, I think it’s a bad idea to have documents anywhere near that large. (You wouldn’t put 20MB of data into a single row in a traditional database, would you?) It introduces performance problems:

  • Any time you load the document, even to read one field, 20MB of data has to be read from disk.
  • Any time a query reads the document, even for just one field, it also has to read 20MB from disk.
  • Sync Gateway has to parse all 20MB of JSON when it stores the document, which is not fast.
  • The replicator does support deltas now, so it will only transmit the parts of the document that changed, but computing what changed can be expensive. (And there are limits: changes in an array are too complex for our current algorithms to handle, so it will send the whole array even if just one item changed.)

If changing your schema to break these up into smaller documents isn’t an option, I suggest taking the portions of the document that aren’t necessary for queries, or which change less often, and making them into a blob/attachment in JSON format. (Or multiple attachments, to keep them under 20MB.)

Hi Jens,
Thanks for the quick and detailed response. I don’t care much about loading document or query as that will almost never happen on client side nor on production server in my use case (this type of document is primarily for record-keeping purpose, which is why its size will grow over time, and I need to let it grow), but the 3rd and 4th points are indeed issues for me.
Breaking up the document into smaller documents isn’t an option for me (because even one of the fields could eventually grow beyond 20MB for active users). I was wondering what’s the size limits on the “blob/attachment in JSON format” you mentioned? I mean what’s the size limit per blob/attachment and what’s the limit on the number of such blob/attachment per document? And any performance considerations for blob/attachment when the total size is on the order of tens of MB?

Thanks!

(this type of document is primarily for record-keeping purpose, which is why its size will grow over time, and I need to let it grow)

It would probably be more efficient to create new documents rather than keep appending to an existing document. Or at least only append to an existing doc for a limited time or until it reaches a size threshold.

what’s the size limit per blob/attachment and what’s the limit on the number of such blob/attachment per document?

Individual blobs are also limited to 20MB since they’re also stored as documents in Couchbase Server.
There is no limit on the number of blobs attached to a document, though.

Thanks Jens ! I am thinking about setting size threshold to 5MB, once document reach this size, just create another document. This circles back to this question: what’s the best way to programmatically get CBLDocument size at run time from mobile end (iOS and Android) in November 2019?

The premise doesn’t make much sense in terms of Couchbase Lite since it stores files in a completely different format than Couchbase Server does. The closest estimate you could probably get would be to transform your document into a JSON string piece by piece and then see how long it becomes (blobs would have to be special cased but they are all roughly the same size in the actual document, so you could probably just add a delta onto your total per blob). Compare the size to the size on Server and adjust as necessary.

Hi Borrrden,
Thanks for the info. Since this is quite a bit of work, could you consider adding an instance method (e.g. getDocSize() or estimateDocSize() ) in CBLDocument class in future releases?

The use case for this is very narrow so I wouldn’t get your hopes up, but I’ll let @priya.rajagopal and @pasin know about it.

All you have to do is convert the document body to JSON, with no whitespace. Non-ASCII characters in strings should be written as UTF-8 and not escaped.

There is nothing special to do about blobs. They are not sent or stored as part of the document body.