Often the information that you are searching or reporting on needs to be summarized or reduced. There are a number of different occasions when this can be useful. For example, if you want to obtain a count of all the items of a particular type, such as comments, recipes matching an ingredient, or blog entries against a keyword.
When using a reduce function in your view, the value that you
specify in the call to emit() is replaced
with the value generated by the reduce function. This is
because the value specified by emit() is
used as one of the input parameters to the reduce function.
The reduce function is designed to reduce a group of values
emitted by the corresponding map()
function.
Alternatively, reduce can be used for performing sums, for example totalling all the invoice values for a single client, or totalling up the preparation and cooking times in a recipe. Any calculation that can be performed on a group of the emitted data.
In each of the above cases, the raw data is the information from
one or more rows of information produced by a call to
emit(). The input data, each record
generated by the emit() call, is reduced
and grouped together to produce a new record in the output.
The grouping is performed based on the value of the emitted key, with the rows of information generated during the map phase being reduced and collated according to the uniqueness of the emitted key.
When using a reduce function the reduction is applied as follows:
For each record of input, the corresponding reduce function is applied on the row, and the return value from the reduce function is the resulting row.
For example, using the built-in _sum
reduce function, the value in each case
would be totaled based on the emitted key:
{ "rows" : [ {"value" : 13000, "id" : "James", "key" : "James" }, {"value" : 20000, "id" : "James", "key" : "James" }, {"value" : 5000, "id" : "Adam", "key" : "Adam" }, {"value" : 8000, "id" : "Adam", "key" : "Adam" }, {"value" : 10000, "id" : "John", "key" : "John" }, {"value" : 34000, "id" : "John", "key" : "John" } ] }
Using the unique key of the name, the data generated by the map above would be reduced, using the key as the collator, to the produce the following output:
{ "rows" : [ {"value" : 33000, "key" : "James" }, {"value" : 13000, "key" : "Adam" }, {"value" : 44000, "key" : "John" }, ] }
In each case the values for the common keys (John, Adam, James), have been totalled, and the six input rows reduced to the 3 rows shown here.
Results are grouped on the key from the call to
emit() if grouping is selected during
query time. As shown in the previous example, the reduction
operates by the taking the key as the group value as using
this as the basis of the reduction.
If you use an array as the key, and have selected the output to be grouped during querying you can specify the level of the reduction function, which is analogous to the element of the array on which the data should be grouped. For more information, see Section 9.8.4, “Grouping in Queries”.
The view definition is flexible. You can select whether the reduce function is applied when the view is accessed. This means that you can access both the reduced and unreduced (map-only) content of the same view. You do not need to create different views to access the two different types of data.
Whenever the reduce function is called, the generated view content contains the same key and value fields for each row, but the key is the selected group (or an array of the group elements according to the group level), and the value is the computed reduction value.
Couchbase includes three built-in reduce functions,
_count,
_sum,
and
_stats.
You can also write your own
custom
reduction functions.
The reduce function also has a final additional benefit. The results of the computed reduction are stored in the index along with the rest of the view information. This means that when accessing a view with the reduce function enabled, the information comes directly from the index content. This results in a very low impact on the Couchbase Server to the query (the value is not computed at runtime), and results in very fast query times, even when accessing information based on a range-based query.
The reduce() function is designed to
reduce and summarize the data emitted during the
map() phase of the process. It should
only be used to summarize the data, and not to transform the
output information or concatenate the information into a
single structure.
When using a composite structure, the size limit on the
composite structure within the reduce()
function is 64KB.
The _count function provides a simple count
of the input rows from the map()
function, using the keys and group level to provide a count of
the correlated items. The values generated during the
map() stage are ignored.
For example, using the input:
{ "rows" : [ {"value" : 13000, "id" : "James", "key" : ["James", "Paris"] }, {"value" : 20000, "id" : "James", "key" : ["James", "Tokyo"] }, {"value" : 5000, "id" : "James", "key" : ["James", "Paris"] }, {"value" : 7000, "id" : "Adam", "key" : ["Adam", "London"] }, {"value" : 19000, "id" : "Adam", "key" : ["Adam", "Paris"] }, {"value" : 17000, "id" : "Adam", "key" : ["Adam", "Tokyo"] }, {"value" : 22000, "id" : "John", "key" : ["John", "Paris"] }, {"value" : 3000, "id" : "John", "key" : ["John", "London"] }, {"value" : 7000, "id" : "John", "key" : ["John", "London"] }, ] }
Enabling the reduce() function and using
a group level of 1 would produce:
{ "rows" : [ {"value" : 3, "key" : ["Adam" ] }, {"value" : 3, "key" : ["James"] }, {"value" : 3, "key" : ["John" ] } ] }
The reduction has produce a new result set with the key as an array based on the first element of the array from the map output. The value is the count of the number of records collated by the first element.
Using a group level of 2 would generate the following:
{ "rows" : [ {"value" : 1, "key" : ["Adam", "London"] }, {"value" : 1, "key" : ["Adam", "Paris" ] }, {"value" : 1, "key" : ["Adam", "Tokyo" ] }, {"value" : 2, "key" : ["James","Paris" ] }, {"value" : 1, "key" : ["James","Tokyo" ] }, {"value" : 2, "key" : ["John", "London"] }, {"value" : 1, "key" : ["John", "Paris" ] } ] }
Now the counts are for the keys matching both the first two elements of the map output.
The built-in _sum function sums the values
from the map() function call, this time
summing up the information in the value for each row. The
information can either be a single number or during a rereduce
an array of numbers.
The input values must be a number, not a
string-representation of a number. The entire map/reduce
will fail if the reduce input is not in the correct format.
You should use the parseInt() or
parseFloat() function calls within your
map() function stage to ensure that the
input data is a number.
For example, using the same sales source data, accessing the group level 1 view would produce the total sales for each salesman:
{ "rows" : [ {"value" : 43000, "key" : [ "Adam" ] }, {"value" : 38000, "key" : [ "James" ] }, {"value" : 32000, "key" : [ "John" ] } ] }
Using a group level of 2 you get the information summarized by salesman and city:
{ "rows" : [ {"value" : 7000, "key" : [ "Adam", "London" ] }, {"value" : 19000, "key" : [ "Adam", "Paris" ] }, {"value" : 17000, "key" : [ "Adam", "Tokyo" ] }, {"value" : 18000, "key" : [ "James", "Paris" ] }, {"value" : 20000, "key" : [ "James", "Tokyo" ] }, {"value" : 10000, "key" : [ "John", "London" ] }, {"value" : 22000, "key" : [ "John", "Paris" ] } ] }
The built-in _stats reduce function
produces statistical calculations for the input data. As with
the _sum function, the corresponding value
in the emit call should be a number. The generated statistics
include the sum, count, minimum (min),
maximum (max) and sum squared
(sumsqr) of the input rows.
Using the sales data, a slightly truncated output at group level one would be:
{ "rows" : [ { "value" : { "count" : 3, "min" : 7000, "sumsqr" : 699000000, "max" : 19000, "sum" : 43000 }, "key" : [ "Adam" ] }, { "value" : { "count" : 3, "min" : 5000, "sumsqr" : 594000000, "max" : 20000, "sum" : 38000 }, "key" : [ "James" ] }, { "value" : { "count" : 3, "min" : 3000, "sumsqr" : 542000000, "max" : 22000, "sum" : 32000 }, "key" : [ "John" ] } ] }
The same fields in the output value are provided for each of the reduced output rows.
The reduce() function has to work
slightly differently to the map()
function. In the primary form, a reduce()
function must convert the data supplied to it from the
corresponding map() function.
The core structure of the reduce function execution is shown the figure below.
The base format of the reduce() function
is as follows:
function(key, values, rereduce) { … return retval; }
The reduce function is supplied three arguments:
key
The key is the unique key derived from
the map() function and the
group_level parameter.
values
The values argument is an array of all of
the values that match a particular key. For example, if
the same key is output three times, data
will be an array of three items containing, with each item
containing the value output by the
emit() function.
rereduce
The rereduce indicates whether the
function is being called as part of a re-reduce, that is,
the reduce function being called again to further reduce
the input data.
When rereduce is false:
The supplied key argument will be an
array where the first argument is the
key as emitted by the map function,
and the id is the document ID that
generated the key.
The values is an array of values where each element of
the array matches the corresponding element within the
array of keys.
When rereduce is true:
key will be null.
values will be an array of values as
returned by a previous reduce()
function.
The function should return the reduced version of the
information by calling the return()
function. The format of the return value should match the
format required for the specified key.
Using this model as a template, it is possible to write the
full implementation of the built-in functions
_sum and _count when
working with the sales data and the standard
map() function below:
function(doc, meta) { emit(meta.id, null); }
The _count function returns a count of
all the records for a given key. Since the
data argument to the reduce function contains
an array of all the values for a given key, the length of the
array needs to be returned in the
reduce() function:
function(key, values, rereduce) { if (rereduce) { var result = 0; for (var i = 0; i < values.length; i++) { result += values[i]; } return result; } else { return values.length; } }
To explicitly write the equivalent of the built-in
_sum reduce function, the sum of supplied
array of values needs to be returned:
function(key, values, rereduce) { var sum = 0; for(i=0; i < values.length; i++) { sum = sum + values[i]; } return(sum); }
In the above function, the array of data values is iterated over and added up, with the final value being returned.
For reduce() functions, they should be
both transparent and standalone. For example, the
_sum function did not rely on global
variables or parsing of existing data, and didn't need to call
itself, hence it is also transparent.
In order to handle incremental map/reduce functionality (i.e. updating an existing view), each function must also be able to handle and consume the functions own output. This is because in an incremental situation, the function must be handle both the new records, and previously computed reductions.
This can be explicitly written as follows:
f(keys, values) = f(keys, [ f(keys, values) ])This can been seen graphically in the illustration below, where previous reductions are included within the array of information are re-supplied to the reduce function as an element of the array of values supplied to the reduce function.
That is, the input of a reduce function can be not only the
raw data from the map phase, but also the output of a previous
reduce phase. This is called rereduce,
and can be identified by the third argument to the
reduce()). When the
rereduce argument is true, both the
key and values arguments are
arrays, with the corresponding element in each containing the
relevant key and value. I.e., key[1] is the
key related to the value of value[1].
An example of this can be seen by considering an expanded
version of the sum function showing the
supplied values for the first iteration of the view index
building:
function('James', [ 13000,20000,5000 ]) {...}When a document with the 'James' key is added to the database, and the view operation is called again to perform an incremental update, the equivalent call is:
function('James', [ 19000, function('James', [ 13000,20000,5000 ]) ]) { ... }In reality, the incremental call is supplied the previously computed value, and the newly emitted value from the new document:
function('James', [ 19000, 38000 ]) { ... }
Fortunately, the simplicity of the structure for
sum means that the function both expects
an array of numbers, and returns a number, so these can easily
be recombined.
If writing more complex reductions, where a compound key is
output, the reduce() function must be
able to handle processing an argument of the previous
reduction as the compound value in addition to the data
generated by the map() phase. For
example, to generate a compound output showing both the total
and count of values, a suitable reduce()
function could be written like this:
function(key, values, rereduce) { var result = {total: 0, count: 0}; for(i=0; i < values.length; i++) { if(rereduce) { result.total = result.total + values[i].total; result.count = result.count + values[i].count; } else { result.total = sum(values); result.count = values.length; } } return(result); }
Each element of the array supplied to the function is checked
using the built-in typeof function to
identify whether the element was an object (as output by a
previous reduce), or a number (from the map phase), and then
updates the return value accordingly.
Using the sample sales data, and group level of two, the output from a reduced view may look like this:
{"rows":[ {"key":["Adam", "London"],"value":{"total":7000, "count":1}}, {"key":["Adam", "Paris"], "value":{"total":19000, "count":1}}, {"key":["Adam", "Tokyo"], "value":{"total":17000, "count":1}}, {"key":["James","Paris"], "value":{"total":118000,"count":3}}, {"key":["James","Tokyo"], "value":{"total":20000, "count":1}}, {"key":["John", "London"],"value":{"total":10000, "count":2}}, {"key":["John", "Paris"], "value":{"total":22000, "count":1}} ] }
Reduce functions must be written to cope with this scenario in order to cope with the incremental nature of the view and index building. If this is not handled correctly, the index will fail to be built correctly.
The reduce() function is designed to
reduce and summarize the data emitted during the
map() phase of the process. It should
only be used to summarize the data, and not to transform the
output information or concatenate the information into a
single structure.
When using a composite structure, the size limit on the
composite structure within the reduce()
function is 64KB.