Suitable use case?
I need to store ~1 billion chunks of ~4kb worth of data (i.e. about 4TB when fully deployed, will add data incrementally). Need real-time access. The data will be accessed fairly randomly, but sometimes sequentially: the chunks string together in series, with ~1 million series or so, sometimes I need to read things one series at a time, but usually just the last chunk in each series. Some chunks are much smaller than 4kb - they may be as small as 4 bytes (maybe 25% of them or so are significantly smaller than 4kb).
1. Is it a good idea to use membase for this type of scenario?
2. Does membase store the values efficiently more or less regardless of size? (i.e. is there a block-ness (= min size) to what gets stored? Overhead per item I assume is small?)
My plan was to use membase on EC2, small instances (1.7GB ram, 1.3GB usable by membase, 1TB EBS disk each), and add nodes as needed as I add data.
3. Has anyone deployed something similar? Any oops/gotchas with the setup outlined above? My assumption was that this will be severely disk (seek) limited, so going for higher CPU or bandwidth EC2 instances would be wasted?