Use Case Questions
I, like many, am confused about the Couchbase product offerings. I recently read the blog post on 2012 changes to expect and, also like many, I find the product simplification and sharper focus a very welcome change. But in the shorter term, I have a couple of specific questions as it relates to a specific open-source, web-based SaaS we are developing. I have read the Couchbase Server 2.0 manual, much of the wiki, and some of the blog posts, so I do hope these are not stupid questions.
Some general information about the project: our specific use case involves a moderate amount of per-user data (varying blocks of free-form text and unstructured citations) in addition to a shared pool of searchable data which will initially number around 300GB and grow significantly over time. The shared data is integrated from API calls, targeted web scrapers, and licensed data and is most amenable to bulk uploads/inserts that occur irregularly, but will be accessed frequently by users and forms the basis of most of the per-user data. Our environment is EC2. We feel our use case is well suited to document-oriented data stores and we were leaning more toward CouchDB than MongoDB for horizontal scalability. Now we're interested in CouchDB versus Couchbase Server.
1) As I understand it, Couchbase Server is fundamentally Memcached with some additional features from CouchDB, so it would be fair to say Couchbase Server is more like Memcached with persistence than it is like CouchDB with in-memory caching. It also seems the product is converging more toward the latter as releases progress. Am I correct here?
2) While per-user data would do well cached in memory in a cluster as the data is relatively small, it would not necessarily be feasible in the short term to cache in memory the 300GB of shared data. Understanding the performance trade-off for "Disk > Memory", are peoples' experience here that this is okay? Need it be cached in memory? Or is this just not a strong use case?
3) The large pool of shared data is the basis of a search feature (location of citations) where, as I understand it, multiple views would be needed: one to search by author, another by source, etc., which are simple enough. There will also need to be the ability to search by multiple keywords (eventually with boolean operations). Combining some of these would be an important feature. To be honest, I am unsure how to approach the creation of the complex views. Can someone point me in the right direction? Or is this just not a good use case?
4) If there are five different ways of obtaining the same document, i.e. five different views, is that document stored in memory and on disk five times?
5) As I understand it, the default object size is 1 MB, meaning any key's value must be relatively small. We will be storing text-based documents for our users that, with metadata, could exceed that number. Is this a flexible number and, if so, what are the consequences to increasing this number? We can also store the text files in another location (like S3) with the value containing a reference, but this increases latency.
6) Similarly, I read that bucket size is limited to 25MB, but I am not sure I understand what this really means. If we have 300GB of data, do we need to create 120,000 buckets and somehow programatically partition the data? I'm sure I'm misunderstanding something here.
7) On CouchDB, we liked the idea of incorporating some of the business logic on the database through the use of validation functions and even custom query servers. Is this possible with Couchbase Server?
9) We will be creating native mobile applications, so Couchbase Mobile intrigued us. The data replicated between the mobile and the cloud will obviously be just the user's data, which I assume is just a filtered replication. Can someone confirm?
10) Lastly, by making Couchbase Mobile an "add-on" to Couchbase Server in the future, will it adopt a similar architecture and feature set, or will it still be like CouchDB on a mobile?
I appreciate you bearing with me as this was a long post. In advance, many thanks for the help!