What to do when I reach the document size limit?


I develop an ecommerce site where I also store ratings in my product document and ohter things. Maybe I get 10k ratings so I store this all in my product document and every day I get more so what now when I reach the size limit ? Or do I have to redesign my db ?

Once you reach document limit you have to redesign.

One easy approach is use overflow document keys as field and store other documents as array.
Get all of them also,
Or use separate collections and use N1QL and JOIN.

But I have read ebay makes 300k write per second so the document size limit can be reached at some minutes or do I have understand something wrong ?

I create a collection products and create recommendation in a seperate collections but if you have a marketplace you store all products in one collection. This limit can be reached in a certain time

Do you store all products that are created in one document ? Then the size limit 20MB can be reached, if you create a new collection for each product then also the limit 1000 can be reached so how the others do it ?

I’m not sure I follow your volume concerns; I’m sure you’re not expecting 300k new reviews per second constantly from e-bay? Or are you expecting a constant write rate of 300 KiB from e-bay? (That’d be ~ 25 GiB a day raw.)

How big do you anticipate your individual reviews to be? 1 KiB on average?

I would typically model it as 1 collection “product” and 1 collection “rating” and join in my queries. In product I’d have one document per product and in ratings one document for each rating (i.e. each individual review).

A select such as:

      ,(SELECT r.rating, r.comment, r.timestamp
        FROM `myBucket`.`myScope`.rating r 
        WHERE r.product = p.id
        ORDER BY r.date DESC
        LIMIT 100
       ) AS ratings
FROM `myBucket`.`myScope`.product
WHERE ... /* filtering omitted */

Would yield a document with a field “ratings” built as an array of the latest (up to) 100 ratings for the product. So something like:

  "id": "product 1",
  "name": "product one",
  "desc": "A product",
  "ratings": [
       "comments": "Some user comment",
       "rating": 4
       "timestamp": "2023-01-01T00:00:00Z"
       "comments": "Some user comment",
       "rating": 3
       "timestamp": "2023-01-01T00:00:00Z"

(Though I may find it simpler to build pages with two separate queries, one for the product and one for the associated ratings.)

And something like:

SELECT p.name, r.rating, COUNT(1) AS num_ratings
FROM `myBucket`.`myScope`.product p
JOIN `myBucket`.`myScope`.rating r
  ON r.product = p.id
WHERE ... /* filtering omitted */
GROUP BY p.name, r.rating
ORDER BY p.name, r.rating DESC

would give a list of products (by name) with the number of ratings by rating (descending) as individual product + ratings. And:

SELECT p.name
      ,(SELECT AVG(r.rating)
        FROM `myBucket`.`myScope`.rating r
        WHERE r.product = p. id
       )[0] average_rating
FROM `myBucket`.`myScope`.product p
WHERE ... /* filtering omitted */

would give the product name and average rating.


(Obviously I’d index appropriately for the queries I’d be running.)

I wouldn’t have volume concerns as my individual documents would be small (e.g. ~ 1 KiB on average for a rating) and there aren’t practical limits - other than space - on the number of documents in a collection.

I might consider this product & rating collection scheme in multiple scopes if the site organisation warranted it; e.g. multiple “brands” each having their own scope - say “the prestige range” and “the value range” as two scopes each with its own product & ratings collection pair.


1 Like

Thank you for your answer and the examples.I also would some columns like an array of medias where they can put images or videos. But I understand the example from you. Thanks! Can you answer me another question ?

If I buy the developer pro with 0.35$ per node capella how many concurrent queries can I run ?

I’d likely be storing each binary attachment as an individual document and have the rating document include an array of IDs for the binary attachments. Fetching such attachments would then be by ID using the key-value lookup functionality of the SDKs, e.g. https://docs.couchbase.com/c-sdk/current/howtos/kv-operations.html.

The Query service doesn’t return binary content; you could of course circumvent this if you uuencoded them, but that would bloat the storage requirements somewhat.

I can’t comment on the details of entitlement for any package and would ask that you submit your question via the form here: Couchbase Pricing for Capella, Server + Mobile Subscriptions


This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.