Once you grasp the concept that you can model real-world objects
as documents and you understand the idea that you can create
relationships between documents, you may wonder how
should you go about representing the
relationships? For instance, if you have an object that has a
relationship of ownership/possession, do you always want to
include fields in that object which reference all the objects it
owns? In other words, if you follow this approach, when an
asteroid has craters, the asteroid document should contain
references to each crater document. In traditional relational
database terminology, this is called a
one-to-many relationship, and is often also
called a has-many relationship. In an
asteroid example we say the "asteroid has many craters" and
conceptually it would appear as follows:
Imagine we are creating a virtual universe containing asteroids. And all asteroids can have zero or more craters; users in the environment can create more craters on the asteroids, and the environment can also create more craters on an asteroid. In this case, we have a relationship of ownership/possession by our asteroid since an asteroid contains the craters that are on it. If we choose to express ownership of the craters by the asteroid and say the asteroid has-many craters, we would provide an asteroid document as follows:
{ "a_id" : "asteroidA", "craters" : ["crater1", "crater2" .... ], .... }
In the asteroid document, we reference the crater by crater ID in an array of craters. Each of the craters could be represented by the following JSON document:
{ "crater_id" : "crater1", "location" : [ "37.42N", "-112.165W" ], "depth" : 80 .... }
But because we are working with a flexible, document-centric design, we could instead put all the references to the object-that-owns in the objects that are owned. In the asteroid example, we would have references from each crater document to the asteroid document. In the relational world, we refer to this as a many-to-one relationship which is sometimes also called a belongs-to relationship. This alternate approach would appear as follows:
The respective asteroid and crater JSON documents for this approach would now appear as follows:
{ "a_id" : "asteroidA", .... }
In the asteroid document we have a unique asteroid ID field,
a_id which we can reference from our crater
documents. Each of the craters could be represented by the
following JSON document:
{ "crater_id" : "crater1", "on_asteroid" : "asteroidA" "location" : [ "37.42N", "-112.165W" ], "depth" : 80 .... }
With this alternate approach, we provide any information about a
relationship between asteroid and crater in each crater document.
We provide a field on_asteroid in each crater
document with the value linking us to the asteroid document.
So which of these two approaches is preferable for relating the two documents? There are two important considerations to keep in mind when you relate documents:
Issues of Contention: if you expect a lot of updates from different processes to occur to a document, creating several belongs-to relationships is more desirable.
In the case of our asteroid example, if we have all craters referenced in the asteroid document, we can expect a good amount of conflict and contention over asteroid document. As users create more craters, or as the environment creates more craters, we can expect conflict between the processes which are all trying to update crater information on a single asteroid document. We could use locks and check-and-sets to prevent conflict, but this would diminish read/write response time. So in this type of scenario, putting the link from craters to asteroid makes more sense to avoid contention.
Retrieving Information: how you relate documents or how you provide references between documents will influence the way you should retrieve data at a later point. Similarly, how you want to retrieve information will influence your decision on how to model your documents.
In this asteroid model, since we choose to reference from craters to asteroid to avoid contention, we need to use indexing and querying to find all craters associated with an asteroid. If we had chosen the first approach where the asteroid contains references to all craters, we could perform a multiple-retrieve with the list of craters to get the actual crater documents.
If we did not have this concern about contention in our asteroid
example, it would be preferable to use the
has-many approach, where one document has
references to multiple documents. This is because performing a
multiple-retrieve on a list of documents is always faster than
getting the same set of documents through indexing and querying.
Therefore, as long as there is less concern about contention, we
should use the has-many model as the preferred
approach. The advantages of this approach apply to all cases where
our object is relative static; for instance if you have a player
document you do not expect to change the player profile that
often. You could store references to player abilities in the
player document and then describe the abilities in separate
documents.
For more information about retrieving information using a multiple retrieve, or by using indexing and querying, see Section 3.6.2, “Retrieving Multiple Keys” and Chapter 4, Finding Data with Views