For date and time selection, consideration must be given to how the data will need to be selected when retrieving the information. This is particularly true when you want to perform log roll-up or statistical collection by using a reduce function to count or quantify instances of a particular event over time.
Examples of this in action include querying data over a specific range, on specific day or date combinations, or specific time periods. Within a traditional relational database it is possible to perform an extraction of a specific date or date range by storing the information in the table as a date type.
Within a map/reduce, the effect can be simulated by exposing the
date into the individual components at the level of detail that
you require. For example, to obtain a report that counts
individual log types over a period identifiable to individual
days, you can use the following map()
function:
function(doc, meta) { emit([doc.year, doc.mon, doc.day, doc.logtype], null); }
By incorporating the full date into the key, the view provides the ability to search for specific dates and specific ranges. By modifying the view content you can simplify this process further. For example, if only searches by year/month are required for a specific application, the day can be omitted.
And with the corresponding reduce()
built-in of _count, you can perform a
number of different queries. Without any form of data selection,
for example, you can use the group_level
parameter to summarize down as far as individual day, month, and
year. Additionally, because the date is explicitly output,
information can be selected over a specific range, such as a
specific month:
endkey=[2010,9,30]&group_level=4&startkey=[2010,9,0]
Here the explicit date has been specified as the start and end
key. The group_level is required to specify
roll-up by the date and log type.
This will generate information similar to this:
{"rows":[ {"key":[2010,9,1,"error"],"value":5}, {"key":[2010,9,1,"warning"],"value":10}, {"key":[2010,9,2,"error"],"value":8}, {"key":[2010,9,2,"warning"],"value":9}, {"key":[2010,9,3,"error"],"value":16}, {"key":[2010,9,3,"warning"],"value":8}, {"key":[2010,9,4,"error"],"value":15}, {"key":[2010,9,4,"warning"],"value":11}, {"key":[2010,9,5,"error"],"value":6}, {"key":[2010,9,5,"warning"],"value":12} ] }
Additional granularity, for example down to minutes or seconds, can be achieved by adding those as further arguments to the map function:
function(doc, meta) { emit([doc.year, doc.mon, doc.day, doc.hour, doc.min, doc.logtype], null); }
The same trick can also be used to output based on other criteria. For example, by day of the week, week number of the year or even by period:
function(doc, meta) { if (doc.mon) { var quarter = parseInt((doc.mon - 1)/3,10)+1; emit([doc.year, quarter, doc.logtype], null); } }
To get more complex information, for example a count of
individual log types for a given date, you can combine the
map() and reduce()
stages to provide the collation.
For example, by using the following map()
function we can output and collate by day, month, or year as
before, and with data selection at the date level.
function(doc, meta) { emit([doc.year, doc.mon, doc.day], doc.logtype); }
For convenience, you may wish to use the
dateToArray() function, which convertes a
date object or string into an array. For example, if the date
has been stored within the document as a single field:
function(doc, meta) { emit(dateToArray(doc.date), doc.logtype); }
For more information, see
dateToArray().
Using the following reduce() function, data
can be collated for each individual logtype for each day within
a single record of output.
function(key, values, rereduce) { var response = {"warning" : 0, "error": 0, "fatal" : 0 }; for(i=0; i<values.length; i++) { if (rereduce) { response.warning = response.warning + values[i].warning; response.error = response.error + values[i].error; response.fatal = response.fatal + values[i].fatal; } else { if (values[i] == "warning") { response.warning++; } if (values[i] == "error" ) { response.error++; } if (values[i] == "fatal" ) { response.fatal++; } } } return response; }
When queried using a group_level of two (by
month), the following output is produced:
{"rows":[ {"key":[2010,7], "value":{"warning":4,"error":2,"fatal":0}}, {"key":[2010,8], "value":{"warning":4,"error":3,"fatal":0}}, {"key":[2010,9], "value":{"warning":4,"error":6,"fatal":0}}, {"key":[2010,10],"value":{"warning":7,"error":6,"fatal":0}}, {"key":[2010,11],"value":{"warning":5,"error":8,"fatal":0}}, {"key":[2010,12],"value":{"warning":2,"error":2,"fatal":0}}, {"key":[2011,1], "value":{"warning":5,"error":1,"fatal":0}}, {"key":[2011,2], "value":{"warning":3,"error":5,"fatal":0}}, {"key":[2011,3], "value":{"warning":4,"error":4,"fatal":0}}, {"key":[2011,4], "value":{"warning":3,"error":6,"fatal":0}} ] }
The input includes a count for each of the error types for each month. Note that because the key output includes the year, month and date, the view also supports explicit querying while still supporting grouping and roll-up across the specified group. For example, to show information from 15th November 2010 to 30th April 2011 using the following query:
?endkey=[2011,4,30]&group_level=2&startkey=[2010,11,15]Which generates the following output:
{"rows":[ {"key":[2010,11],"value":{"warning":1,"error":8,"fatal":0}}, {"key":[2010,12],"value":{"warning":3,"error":4,"fatal":0}}, {"key":[2011,1],"value":{"warning":8,"error":2,"fatal":0}}, {"key":[2011,2],"value":{"warning":4,"error":7,"fatal":0}}, {"key":[2011,3],"value":{"warning":4,"error":4,"fatal":0}}, {"key":[2011,4],"value":{"warning":5,"error":7,"fatal":0}} ] }
Keep in mind that you can create multiple views to provide
different views and queries on your document data. In the
above example, you could create individual views for the
limited datatypes of logtype to create a
warningsbydate view.