Although you are free to write views matching your data, you should keep in mind the performance and storage implications of creating and organizing the different design document and view definitions.
You should keep the following in mind while developing and deploying your views:
Quantity of Views per Design Document
Because the index for each map/reduce combination within each view within a given design document is updated at the same time, avoid declaring too many views within the same design document. For example, if you have a design document with five different views, all five views will be updated simultaneously, even if only one of the views is accessed.
This can result in increase view index generation times, especially for frequently accessed views. Instead, move frequently used views out to a separate design document.
The exact number of views per design document should be determined from a combination of the update frequency requirements on the included views and grouping of the view definitions. For example, if you have a view that needs to be updated with a high frequency (for example, comments on a blog post), and another view that needs to be updated less frequently (e.g. top blogposts), separate the views into two design documents so that the comments view can be updated frequently, and independently, of the other view.
You can always configure the updating of the view through
the use of the stale parameter (see
Section 9.2.4, “Index Updates and the stale Parameter”). You can
also configure different automated view update times for
individual design documents, for more information see
Section 9.2.5, “Automated Index Updates”.
Modifying Existing Views
If you modify an existing view definition, or are executing a full build on a development view, the entire view will need to be recreated. In addition, all the views defined within the same design document will also be recreated.
Rebuilding all the views within a single design document is
an expensive operation in terms of I/O and CPU requirements,
as each document will need to be parsed by each views
map() and reduce()
functions, with the resulting index stored on disk.
This process of rebuilding will occur across all the nodes within the cluster and increases the overall disk I/O and CPU requirements until the view has been recreated. This process will take place in addition to any production design documents and views that also need to be kept up to date.
Don't Include Document ID
The document ID is automatically output by the view system
when the view is accessed. When accessing a view without
reduce enabled you can always determine the document ID of
the document that generated the row. You should not include
the document ID (from meta.id) in your
key or value data.
Check Document Fields
Fields and attributes from source documentation in
map() or reduce())
functions should be checked before their value is checked or
compared. Because the view definitions in a design document
are processed at the same time. A runtime error in one of
the views within a design document will cause the other
views in the same design document not to be executed. A
common cause of runtime errors in views is missing, or
invalid field and attribute checking.
The most common issue is a field within a null object being
accessed. This generates a runtime error that will cause
execution of all views within the design document to fail.
To address this problem, you should check for the existence
of a given object before it is used, or the content value is
checked. For example, the following view will fail if the
doc.ingredient object does not exist,
because accessing the length attribute on
a null object will fail:
function(doc, meta) { emit(doc.ingredient.ingredtext, null); }
Adding a check for the parent object before calling
emit() ensures that the function is not
called unless the field in the source document exists:
function(doc, meta) { if (doc.ingredient) { emit(doc.ingredient.ingredtext, null); } }
The same check should be performed when comparing values
within the if statement.
This test should be performed on all objects where you are checking the attributes or child values (for example, indices of an array).
View Size, Disk Storage and I/O
Within the map function, the information declared within
your emit() statement is included in
the view index data and stored on disk. Outputting this
information will have the following effects on your indexes:
Increased index size on disk — More detailed or complex key/value combinations in generated views will result in more information being stored on disk.
Increased disk I/O — in order to process and store the information on disk, and retrieve the data when the view is queried. A larger more complex key/value definition in your view will increase the overall disk I/O required both to update and read the data back.
The result is that the index can be quite large, and in some cases, the size of the index can exceed the size of the original source data by a significant factor if multiple views are created, or you include large portions or the entire document data in the view output.
For example, if each view contains the entire document as part of the value, and you define ten views, the size of your index files will be more than 10 times the size of the original data on which the view was created. With a 500-byte document and 1 million documents, the view index would be approximately 5GB with only 500MB of source data.
Including Value Data in Views
Views store both the key and value emitted by the
emit(). To ensure the highest
performance, views should only emit the minimum key data
required to search and select information. The value output
by emit() should only be used when you
need the data to be used within a
reduce().
You can obtain the document value by using the core Couchbase API to get individual documents or documents in bulk. Some SDKs can perform this operation for you automatically. See Couchbase SDKs.
Using this model will also prevent issues where the emitted view data may be inconsistent with the document state and your view is emitting value data from the document which is no longer stored in the document itself.
For views that are not going to be used with reduce, you should output a null value:
function(doc, meta) { emit(doc.experience, null); }
This will create an optimized view containing only the information required, ensuring the highest performance when updating the view, and smaller disk usage.
Don't Include Entire Documents in View output
A view index should be designed to provide base information and through the implicitly returned document ID point to the source document. It is bad practice to include the entire document within your view output.
You can always access the full document data through the client libraries by later requesting the individual document data. This is typically much faster than including the full document data in the view index, and enables you to optimize the index performance without sacrificing the ability to load the full document data.
For example, the following is an example of a bad view:
function(doc, meta) { emit(doc.experience, doc); }
The above view may have significant performance and index size effects.
This will include the full document content in the index.
Instead, the view should be defined as:
function(doc, meta) { emit(doc.experience, null); }
You can then either access the document data individually through the client libraries, or by using the built-in client library option to separately obtain the document data.
Using Document Types
If you are using a document type (by using a field in the stored JSON to indicate the document structure), be aware that on a large database this can mean that the view function is called to update the index for document types that are not being updated or added to the index.
For example, within a database storing game objects with a standard list of objects, and the users that interact with them, you might use a field in the JSON to indicate 'object' or 'player'. With a view that outputs information when the document is an object:
function(doc, meta) { emit(doc.experience, null); }
If only players are added to the bucket, the map/reduce functions to update this view will be executed when the view is updated, even though no new objects are being added to the database. Over time, this can add a significant overhead to the view building process.
In a database organization like this, it can be easier from an application perspective to use separate buckets for the objects and players, and therefore completely separate view index update and structure without requiring to check the document type during progressing.
Use Built-in Reduce Functions
Where possible, use one of the supplied built-in reduce
functions,
_sum,
_count,
_stats.
These functions are highly optimized. Using a custom reduce function requires additional processing and may impose additional build time on the production of the index.