We are using couchbase as our nosql store and loving it for its capabilities.
There is however an issue that we are running in with creating associations
via view collation. This can be thought of akin to a join operation.
While our data sets are confidential I am illustrating the problem with this model.
The volume of data is considerable so cannot be processed in memory.Lets say we have data on ice-creams, zip-code and average temperature of the day.
One type of document contains a zipcode to icecream mapping
and the other one has transaction data of an ice-cream being sold in a particular zip.
The problem is to be able to determine a set of top ice-creams sold by the temperature of a given day.
We crunch this corpus with a view to emit two outputs, one is a zipcode to temperature mapping , while the other
represents an ice-cream sale in a zip code. :
The view collation here is a mechanism to create an association between the ice_cream sale, the zip and the average temperature ie a join.
We have a constraint that the temperature lookup happens only once in 24 hours when the zip is first seen and that is the valid
avg temperature to use for that day. eg lookup happened at 12:00 pm on Jan 1st, next lookup does not happen till 12:00 pm Jan 2nd.
However the avg temperature that is accepted in the 1st lookup is valid only for Jan 1st and that on the 2nd lookup only for Jan 2
including the first half of the day.
Now things get complicated when I want to do the same query with a time component involved, concretely associating the average temperature of a
day with the ice-creams that were sold on that day in that zip.eg. x vanilla icecreams were sold when the average temperature for that day is 70 F
[y,m,d,zip2,ice_cream2 ] 1
This has an interesting impact on the queries, say I query for the last 1 day I cannot make any associations between the ice-cream and temperature before the
first lookup happens, since that is when the two keys align. The net effect being that I lose the ice-cream counts for that day before that temperature lookup
happens. I was wondering if any of you have faced similar issues and if you are aware of a pattern or solution so as not to lose those counts.