Newbie rambling and revs_limit question
I'm researching options for storing data arriving like so
fred, apple, 2011/09/04,13:00
fred, kiwi, 2011/09/04 13:00
fred, apple, 2011/09/04 13:01
bob, apple, 2011/09/04 13:02
bob, orange, 2011/09/04 13:03
sue, plum, 2011/09/04 13:04
Basically it's source, target, timestamp. queries to this data would be:
1) who are the sources? (fred,bob,sue)
2) What has fred targeted (apple=2, kiwi=1)
3) Who asked for apples ( fred,bob)
Ideally I'd like to store every data instance including the unique timestamp, but the sheer volume seems to make that unrealistic for my resources. Initial testing seems to suggest that each document I store consumes 64KB of space (?). Each revision, even for a trivial amount of data, seems to cost 4Kb. Fortunately, most of what I want to mine can be done using aggregate data- recording fred->apple happened twice and the last time it happened is pretty much ok.
The volume is 5000 per second writes, predicted 50 million unique relationships (fred->apple, fred->kiwi). I'm still trying to decide whether I need a traditional RDBMS or a NoSQL solution, but with a nosql approach and couch the write volume is what makes me think couch + caching could be what I need.
I've got Couchbase Server 2.0 PR2 installed and I'm just getting started with trying to see whether this is going to be a good fit.
I was planning to use something like:
"id" : "fred-apple",
"count" : 1,
… and then set _revs_limit to 1. I would write an update handler that basically did the following
1) check to see "fred-apple" document exists, if not create one and set count 1.
2) if it did exist, update the existing item and increment counter
Each stored relationships should never consume more than 64KB since only 1 revision exists?
My question as it pertains to this actual forum: I am getting "error unknown not_implemented" when trying to set the revs_limit. Is that because the API isn't complete? Or should this work?
On a mostly unrelated note, am I am going to be able to build views that use data in the ID field and the server timestamp for the stored documents?
Any thoughts greatly appreciated.