Scaling index reads (Performance)
I'm very excited about couchbase 2.0, and i'm investigating the possibility of getting replacing Cassandra with Membase from our current stack. After having run Cassandra in production for some time, we've seen it's shortcomings firsthand and want to find a better solution.
The new indexing and query features in couchbase are very appealing, however i'm curious as to the performance implications of querying the indexes, and i haven't been able to find any detailed documentation or descriptions about the a) the implementation or b) it's current performance/scaling characteristics.
I've read somewhere that the system uses scatter-gather, without further information which leads me to believe that the implementation works something like this:
- Each server in the cluster maintains an index of the objects residing on that server (built incrementally by underlying couchdb store)
- A query is sent to all servers (either via a smart client library or from inside the server) - the scatter phase -, and the results are then gathered on that one machine and sent back to the client with irrelevant data removed.
So, actual questions:
a) Is that how it works?
b) If yes, does that not imply that we will never be able to scale index reads above the capabilities of the smallest box in the cluster (and also that index reads will use processing on ALL machines in the cluster)
c) if yes, is there a roadmap or simliar for moving to some sort of distributed indexes where index reads don´t go across the entire cluster but exhibit linearly scaling characteristics when adding servers like the underlying key/value system?