Here is an elasticsearch query I’m using to aggregate data. The data is denormalized and is one “table” or document.
A document just has a dept_id, dept_name, sub_dept_id, sub_dept_name, item_id, item_name, location, week_date
Each dept will have about 104 million records max. Currently we have 3 depts. We insert about 3 million documents weekly.
GET weekly_sales/_search
{
"size": 1,
"query": {
"bool": {
"filter": [
{
"range": {
"week_date": {
"format": "yyyy-MM-dd",
"gte": "2016-01-01",
"lte": "2017-02-21"
}
}
},
{
"term": {
"department_id": "4"
}
},
{
"term": {
"sub_dept_id": "20"
}
}
]
}
},
"aggs": {
"group_by_location": {
"terms": {
"size": 10000,
"field": "location"
},
"aggs": {
"group_by_item_id": {
"terms": {
"size": 10000,
"field": "item_id"
},
"aggs": {
"total_sales" :{
"sum": {
"field": "total_sales"
}
},
"agg_source": {
"top_hits": {
"size": 1,
"_source": {
"includes": ["department_id", "dept_name", "sub_dept_id", "sub_dept_name","item_id", "item_name", "location"]
}
}
}
}
}
}
}
}
}
My end result is to have something similar to(this is how elasticsearch returns data, couch might return something better or cleaner i hope):
{
"grouped_by_location": [
{
{
"location": 1,
"grouped_by_item_id: [
"item_id": 333
{
"dept_id":"1",
"dept_name":"dept1",
"sub_dept_id":"2",
"sub_dept_name":"subDeptName",
"item_name":"item333",
"total_sales_sum": 48584943
}
]
}
}
]
}