When you work on document design, there are a few other considerations you should have in mind. This will help you determine whether you use one or more documents to represent something in your application. Consider:
Whether you will represent the items as separate objects,
Whether you want to access the objects together at runtime,
If you want some data to be atomic; that is, changes occur at once to this data, or the change fails and will not made.
The following provides some guidelines on when you would prefer using one or more than one documents to represent your data.
When you use one document to contain all related data you typically get these benefits:
Your application data is de-normalized,
Read/Write related information in one operation,
Eliminate need for client-side joins.
If you put all information for a transaction in a single document, you can better guarantee atomicity.
When you provide a single document to represents an entire entity and any related records, the document is known as an 'aggregate.' You can also choose to use separate documents for different object types in your application. This approach is known as 'denormalization' in NoSQL database terms. In this case you provide cross references between objects as we demonstrated earlier in the beer-brewery documents. You typically gain the following from separate documents
Reduce data duplication,
May provide better application performance and scale by keeping document size smaller,
Application objects do not need to be in same document; separate documents may better reflect the objects as they are in the real world.
The following examples demonstrate the use of a single document versus separate documents for a simple blog. In the application a user can create an entry with title and content. Other users can add comments to the post. In the first case, we have a single JSON document to represent a blog post, plus all the comments for the post:
{ "post_id": "dborkar_Hello_World", "author": "dborkar", "type": "post" "title": "Hello World", "format": "markdown", "body": "Hello from [Couchbase](http://couchbase.com).", "html": "<p>Hello from <a href=\"http: … "comments":[ ["format": "markdown", "body":"Awesome post!"], ["format”: "markdown", "body":"Like it." ] ] }
The next sample JSON document shows the same blog post, however we have split the post into the actual entry and a separate comment document. First is the core blog post document as JSON. Notice we have a reference to two comments with the key 'comments' and two values in an array:
{ "post_id": "dborkar_Hello_World", "author": "dborkar", "type": "post", "title": "Hello World", "format": "markdown", "body": "Hello from [Couchbase](http://couchbase.com).", "html": “<p>Hello from <a href="http: …"> "comments" : ["comment1_jchris_Hello_world", "comment2_kzeller_Hello_World"] }
The next document contains the first comment that is associated with the post. It has the key of 'comment1_jchris_Hello_world' and it has a reference back to the blog post it belongs to under "comment_id":
{ "comment_id": "comment1_dborkar_Hello_World", "format": "markdown", "body": "Awesome post!" }
The next example demonstrates our beer and breweries example as single and separate documents. If we wanted to use a single-document approach to represent a beer, it could look like this in JSON:
{ "beer_id": 10.0, "name": "Hoptimus Prime", "category": "North American Ale", "style": "Imperial or Double India Pale Ale", "brewery": "Legacy Brewing Co." : { "address1" : "Easy Peasy St.", "address2" : "Suite 4", "city" : "Baltimore", "state" : "Maryland", "zip" : "21215", "capacity" : 10000, }, "updated": [2010, 7, 22, 20, 0, 20], "available": true }
In this case we provide information about the brewery as a subset of the beer. But consider the case where we have more than one beer from the brewery, in this case:
{ "beer_id": 12.0, "name": "Pleny the Hipster", "category": "Wheat Beer", "style": "Koelsch", "brewery": "Legacy Brewing Co." : { "address1" : "Easy Peasy St.", "address2" : "Suite 4", "city" : "Baltimore", "state" : "Maryland", "zip" : "21215", "capacity" : 10000, }, "updated": [2011, 8, 2, 20, 0, 20], "available": true }
In this case we are starting to develop duplicate information because we have the same brewery information in each beer document. Here it makes sense to separate the brewery and beers as different documents and relate them through fields. The revised, separate beer document. Notice we have added a new field to represent the brewery and provide the brewer id:
{ "beer_id": 10.0, "name": "Hoptimus Prime", "category": "North American Ale", "style": "Imperial or Double India Pale Ale", "brewery" : "leg_brew_10" "updated": [2010, 7, 22, 20, 0, 20], "available": true }
And here is the associated brewery as a separate brewery document. In this case, we may simplify the document structure since it is separate, and provide all the brewery information at the same level:
{ "brewery_id" : "leg_brew_10", "name": "Legacy Brewing Co.", "address1" : "Easy Peasy St.", "address2" : "Suite 4", "city" : "Baltimore", "state" : "Maryland", "zip" : "21215", "capacity" : 10000, }