First migration from typical relational (MySQL) to NoSQL
Hello all!
I'm dropping here this message in order to get some ideas/opinions on
a migration from a typical relational solution to a NoSQL one.
First, a little context on the situation:
- Our database has experienced a fast grown in the past few months,
and at the present moment we are achieving very fast a 1.5TB of
information with a tendency to grow event faster until the end of
2012
(triplicate this value).
- We are sincerely considering the hypothesis of migrating our
persistency system to a NoSQL one, since that MySQL is becoming the
bottleneck of our application (We replicated it, sharded it,
partitioned it, but still it won't feedback in an acceptable time).
- We already started the process of layering our application in
different services, still, we want to give a chance to NoSQL as the
persistency system and don't stick "ad eternum" to the relational
paradigm.
Requisits:
- The NoSQL flavour should be oriented to Document Store: We have the
visitor entity with a lot of attributes, sessions related to a
visitor, and actions perfomed by those visitors during a session. We
think that a document that represents this information would fit
quietly well!
- We write a lot of of records per second (sessions and actions).
- We read big big sets of data(i.e. visitor entire history).
- On the top of this, we still need to deliver accurate and on demand
(real time) analytics related to visitors and actions.
Document Model:
We have the visitor entity:
- name, telephone, email, country, etc
Sessions (each visitor has a lot of sessions, with information like
the OS, browser, creation_timestamp, duration, ip, etc);
Events(each session has a lot of events, like clicks, mouse move,
etc)
Actions(during a session, a visitor may or may not execute some
actions)
The queries should be like: Give me all the ACTIVE visitor of
countries A,B,C using Chrome as browser but not IE. Or give me all
the
visitor with a session that lasted at least X minutes.
Hardware:
- We will have 4 dedicated servers (for persistency) to this
implement
this new architecture, each one with 96Gs of RAM, SSD disks and 2
quadcores 2.4Ghz. (not quiet sure on this but it will be very
similar)
Now the question is:
We have been reading a lot about couchbase and mongodb:
- The first one gives us an easier deploy, with master to master
replication, auto sharding and balancing but without a convenient way
of making queries in order to retrieve less and concrete data.
- The second one has master-slave replication, a more difficult
deploy
and maintenance process, but a query/fetch system more
elastic than couchbase.
With your experience, can you give us some topics/ideas/opinions on
how to start this implementation?
Best Regards!