Search:

Search all manuals
Search this manual
Manual
Couchbase Server Manual 2.0
Community Wiki and Resources
Download Couchbase Server 2.0
Couchbase Developer Guide 2.0
Client Libraries
Couchbase Server Forum
Additional Resources
Community Wiki
Community Forums
Couchbase SDKs
Parent Section
9.5 Writing Views
Chapter Sections
Chapters

9.5.2. Reduce Functions

9.5.2.1. Built-in _count
9.5.2.2. Built-in _sum
9.5.2.3. Built-in _stats
9.5.2.4. Writing Custom Reduce Functions
9.5.2.5. Re-writing the built-in Reduce Functions
9.5.2.6. Handling Rereduce

Often the information that you are searching or reporting on needs to be summarized or reduced. There are a number of different occasions when this can be useful. For example, if you want to obtain a count of all the items of a particular type, such as comments, recipes matching an ingredient, or blog entries against a keyword.

Note

When using a reduce function in your view, the value that you specify in the call to emit() is replaced with the value generated by the reduce function. This is because the value specified by emit() is used as one of the input parameters to the reduce function. The reduce function is designed to reduce a group of values emitted by the corresponding map() function.

Alternatively, reduce can be used for performing sums, for example totalling all the invoice values for a single client, or totalling up the preparation and cooking times in a recipe. Any calculation that can be performed on a group of the emitted data.

In each of the above cases, the raw data is the information from one or more rows of information produced by a call to emit(). The input data, each record generated by the emit() call, is reduced and grouped together to produce a new record in the output.

The grouping is performed based on the value of the emitted key, with the rows of information generated during the map phase being reduced and collated according to the uniqueness of the emitted key.

When using a reduce function the reduction is applied as follows:

The view definition is flexible. You can select whether the reduce function is applied when the view is accessed. This means that you can access both the reduced and unreduced (map-only) content of the same view. You do not need to create different views to access the two different types of data.

Whenever the reduce function is called, the generated view content contains the same key and value fields for each row, but the key is the selected group (or an array of the group elements according to the group level), and the value is the computed reduction value.

Couchbase includes three built-in reduce functions, _count, _sum, and _stats. You can also write your own custom reduction functions.

The reduce function also has a final additional benefit. The results of the computed reduction are stored in the index along with the rest of the view information. This means that when accessing a view with the reduce function enabled, the information comes directly from the index content. This results in a very low impact on the Couchbase Server to the query (the value is not computed at runtime), and results in very fast query times, even when accessing information based on a range-based query.

Note

The reduce() function is designed to reduce and summarize the data emitted during the map() phase of the process. It should only be used to summarize the data, and not to transform the output information or concatenate the information into a single structure.

When using a composite structure, the size limit on the composite structure within the reduce() function is 64KB.

9.5.2.1. Built-in _count

The _count function provides a simple count of the input rows from the map() function, using the keys and group level to provide a count of the correlated items. The values generated during the map() stage are ignored.

For example, using the input:

JSON
{
   "rows" : [
      {"value" : 13000, "id" : "James", "key" : ["James", "Paris"] },
      {"value" : 20000, "id" : "James", "key" : ["James", "Tokyo"] },
      {"value" : 5000,  "id" : "James", "key" : ["James", "Paris"] },
      {"value" : 7000,  "id" : "Adam",  "key" : ["Adam",  "London"] },
      {"value" : 19000, "id" : "Adam",  "key" : ["Adam",  "Paris"] },
      {"value" : 17000, "id" : "Adam",  "key" : ["Adam",  "Tokyo"] },
      {"value" : 22000, "id" : "John",  "key" : ["John",  "Paris"] },
      {"value" : 3000,  "id" : "John",  "key" : ["John",  "London"] },
      {"value" : 7000,  "id" : "John",  "key" : ["John",  "London"] },
    ]
}

Enabling the reduce() function and using a group level of 1 would produce:

JSON
{
   "rows" : [
      {"value" : 3, "key" : ["Adam" ] },
      {"value" : 3, "key" : ["James"] },
      {"value" : 3, "key" : ["John" ] }
   ]
}

The reduction has produce a new result set with the key as an array based on the first element of the array from the map output. The value is the count of the number of records collated by the first element.

Using a group level of 2 would generate the following:

JSON
{
   "rows" : [
      {"value" : 1, "key" : ["Adam", "London"] },
      {"value" : 1, "key" : ["Adam", "Paris" ] },
      {"value" : 1, "key" : ["Adam", "Tokyo" ] },
      {"value" : 2, "key" : ["James","Paris" ] },
      {"value" : 1, "key" : ["James","Tokyo" ] },
      {"value" : 2, "key" : ["John", "London"] },
      {"value" : 1, "key" : ["John", "Paris" ] }
   ]
}

Now the counts are for the keys matching both the first two elements of the map output.

9.5.2.2. Built-in _sum

The built-in _sum function sums the values from the map() function call, this time summing up the information in the value for each row. The information can either be a single number or during a rereduce an array of numbers.

Note

The input values must be a number, not a string-representation of a number. The entire map/reduce will fail if the reduce input is not in the correct format. You should use the parseInt() or parseFloat() function calls within your map() function stage to ensure that the input data is a number.

For example, using the same sales source data, accessing the group level 1 view would produce the total sales for each salesman:

JSON
{
   "rows" : [
      {"value" : 43000, "key" : [ "Adam"  ] },
      {"value" : 38000, "key" : [ "James" ] },
      {"value" : 32000, "key" : [ "John"  ] }
   ]
}

Using a group level of 2 you get the information summarized by salesman and city:

JSON
{
   "rows" : [
      {"value" : 7000,  "key" : [ "Adam",  "London" ] },
      {"value" : 19000, "key" : [ "Adam",  "Paris"  ] },
      {"value" : 17000, "key" : [ "Adam",  "Tokyo"  ] },
      {"value" : 18000, "key" : [ "James", "Paris"  ] },
      {"value" : 20000, "key" : [ "James", "Tokyo"  ] },
      {"value" : 10000, "key" : [ "John",  "London" ] },
      {"value" : 22000, "key" : [ "John",  "Paris"  ] }
   ]
}

9.5.2.3. Built-in _stats

The built-in _stats reduce function produces statistical calculations for the input data. As with the _sum function, the corresponding value in the emit call should be a number. The generated statistics include the sum, count, minimum (min), maximum (max) and sum squared (sumsqr) of the input rows.

Using the sales data, a slightly truncated output at group level one would be:

JSON
{
   "rows" : [
      {
         "value" : {
            "count" : 3,
            "min" : 7000,
            "sumsqr" : 699000000,
            "max" : 19000,
            "sum" : 43000
         },
         "key" : [
            "Adam"
         ]
      },
      {
         "value" : {
            "count" : 3,
            "min" : 5000,
            "sumsqr" : 594000000,
            "max" : 20000,
            "sum" : 38000
         },
         "key" : [
            "James"
         ]
      },
      {
         "value" : {
            "count" : 3,
            "min" : 3000,
            "sumsqr" : 542000000,
            "max" : 22000,
            "sum" : 32000
         },
         "key" : [
            "John"
         ]
      }
   ]
}

The same fields in the output value are provided for each of the reduced output rows.

9.5.2.4. Writing Custom Reduce Functions

The reduce() function has to work slightly differently to the map() function. In the primary form, a reduce() function must convert the data supplied to it from the corresponding map() function.

The core structure of the reduce function execution is shown the figure below.

Figure 9.11. Views — Writing Custom Reduce Functions

Views — Writing Custom Reduce Functions

The base format of the reduce() function is as follows:

Javascript
function(key, values, rereduce) {


return retval;
}

The reduce function is supplied three arguments:

  • key

    The key is the unique key derived from the map() function and the group_level parameter.

  • values

    The values argument is an array of all of the values that match a particular key. For example, if the same key is output three times, data will be an array of three items containing, with each item containing the value output by the emit() function.

  • rereduce

    The rereduce indicates whether the function is being called as part of a re-reduce, that is, the reduce function being called again to further reduce the input data.

    When rereduce is false:

    • The supplied key argument will be an array where the first argument is the key as emitted by the map function, and the id is the document ID that generated the key.

    • The values is an array of values where each element of the array matches the corresponding element within the array of keys.

    When rereduce is true:

    • key will be null.

    • values will be an array of values as returned by a previous reduce() function.

The function should return the reduced version of the information by calling the return() function. The format of the return value should match the format required for the specified key.

9.5.2.5. Re-writing the built-in Reduce Functions

Using this model as a template, it is possible to write the full implementation of the built-in functions _sum and _count when working with the sales data and the standard map() function below:

Javascript
function(doc, meta) 
{
  emit(meta.id, null);
}

The _count function returns a count of all the records for a given key. Since the data argument to the reduce function contains an array of all the values for a given key, the length of the array needs to be returned in the reduce() function:

Javascript
function(key, values, rereduce) {
   if (rereduce) {
       var result = 0;
       for (var i = 0; i < values.length; i++) {
           result += values[i];
       }
       return result;
   } else {
       return values.length;
   }
}

To explicitly write the equivalent of the built-in _sum reduce function, the sum of supplied array of values needs to be returned:

Javascript
function(key, values, rereduce) {
  var sum = 0;
  for(i=0; i < values.length; i++) {
    sum = sum + values[i];
  }
  return(sum);
}

In the above function, the array of data values is iterated over and added up, with the final value being returned.

9.5.2.6. Handling Rereduce

For reduce() functions, they should be both transparent and standalone. For example, the _sum function did not rely on global variables or parsing of existing data, and didn't need to call itself, hence it is also transparent.

In order to handle incremental map/reduce functionality (i.e. updating an existing view), each function must also be able to handle and consume the functions own output. This is because in an incremental situation, the function must be handle both the new records, and previously computed reductions.

This can be explicitly written as follows:

Javascript
f(keys, values) = f(keys, [ f(keys, values) ])

This can been seen graphically in the illustration below, where previous reductions are included within the array of information are re-supplied to the reduce function as an element of the array of values supplied to the reduce function.

Figure 9.12. Views — Handling Rereduce

Views — Handling Rereduce

That is, the input of a reduce function can be not only the raw data from the map phase, but also the output of a previous reduce phase. This is called rereduce, and can be identified by the third argument to the reduce()). When the rereduce argument is true, both the key and values arguments are arrays, with the corresponding element in each containing the relevant key and value. I.e., key[1] is the key related to the value of value[1].

An example of this can be seen by considering an expanded version of the sum function showing the supplied values for the first iteration of the view index building:

Javascript
function('James', [ 13000,20000,5000 ]) {...}

When a document with the 'James' key is added to the database, and the view operation is called again to perform an incremental update, the equivalent call is:

Javascript
function('James', [ 19000, function('James', [ 13000,20000,5000 ]) ]) { ... }

In reality, the incremental call is supplied the previously computed value, and the newly emitted value from the new document:

Javascript
function('James', [ 19000, 38000 ]) { ... }

Fortunately, the simplicity of the structure for sum means that the function both expects an array of numbers, and returns a number, so these can easily be recombined.

If writing more complex reductions, where a compound key is output, the reduce() function must be able to handle processing an argument of the previous reduction as the compound value in addition to the data generated by the map() phase. For example, to generate a compound output showing both the total and count of values, a suitable reduce() function could be written like this:

Javascript
function(key, values, rereduce) {
  var result = {total: 0, count: 0};
  for(i=0; i < values.length; i++) {
    if(rereduce) {
        result.total = result.total + values[i].total;
        result.count = result.count + values[i].count;
    } else {
        result.total = sum(values);
        result.count = values.length;
    }
  }
  return(result);
}

Each element of the array supplied to the function is checked using the built-in typeof function to identify whether the element was an object (as output by a previous reduce), or a number (from the map phase), and then updates the return value accordingly.

Using the sample sales data, and group level of two, the output from a reduced view may look like this:

JSON
{"rows":[
{"key":["Adam", "London"],"value":{"total":7000,  "count":1}},
{"key":["Adam", "Paris"], "value":{"total":19000, "count":1}},
{"key":["Adam", "Tokyo"], "value":{"total":17000, "count":1}},
{"key":["James","Paris"], "value":{"total":118000,"count":3}},
{"key":["James","Tokyo"], "value":{"total":20000, "count":1}},
{"key":["John", "London"],"value":{"total":10000, "count":2}},
{"key":["John", "Paris"], "value":{"total":22000, "count":1}}
]
}

Reduce functions must be written to cope with this scenario in order to cope with the incremental nature of the view and index building. If this is not handled correctly, the index will fail to be built correctly.

Note

The reduce() function is designed to reduce and summarize the data emitted during the map() phase of the process. It should only be used to summarize the data, and not to transform the output information or concatenate the information into a single structure.

When using a composite structure, the size limit on the composite structure within the reduce() function is 64KB.