The key to being successful in big data initiatives is being able to manage the speed, scale and structure at sub-millisecond speed.
Big Data is a big term. It encompasses concepts about data types, dozens of different technologies to manage those data types and the eco-system around all those technologies. And everything in it moves fast!
Big data is quickly evolving. A classic big data solution, the most common big data technology architecture in use today, relies on importing and exporting data (typically into Hadoop) via batch processes. While this has yielded tremendous business results in the form of better customer insight and predictive analysis, it is not a real time solution. It is slow.
As technology advances at an ever-increasing rate, so are best practices for big data solutions: a modern big data solution relies on real-time data processing via stream processing. A modern big data solution leverages integration with Elasticsearch, Storm, and more. It enables real-time analysis and search while meeting operational requirements. In order to enable real-time analysis and search, a modern big data solution requires a high performance NoSQL database that is scalable. The NoSQL database must fulfill operational requirements while meeting the performance requirements necessary to enable real-time analysis and search.
A modern big data solution is only as fast as its slowest component. That brings us to a recent announcement by Mongo and Cloudera. While we applaud every effort to help customers understand best practices for big data architecture, we also must address which NoSQL solution is the right piece to enable a truly, fast big data architecture. A scalable, high performance NoSQL database ensures that the operational database will not be the slowest component. A NoSQL database that’s difficult to scale and imposes heavy locks on its read and write traffic will fail to leverage the potential a modern big data solution. This is the difference between MongoDB and Couchbase Server. Sure, MongoDB can be a part of classic big data solutions: these were not designed for real time analytics and don’t need the speed that a modern big data solution requires. Couchbase Server can be a part of both classic big data solutions and modern big data solutions.
A classic big data solution, which we mentioned earlier, is in use at many organizations today. It typically relies on integration with Hadoop. Couchbase Server integrates with Hadoop via a Cloudera certified Sqoop connector (link).
Matt Asay cited a classic big data use case where Hadoop analyzes the crowd and a NoSQL database interacts with the individuals. The individual interactions are fed to Hadoop and the crowd analysis is fed to the NoSQL database. For Couchbase, this isn’t just a use case. It’s a customer reference. AOL leverages Hadoop and Couchbase Server in a classic big data solution to enable intelligent advertising (link).
LivePerson leverages Hadoop, Storm and Couchbase Server in its modern big data solution. The LivePerson architecture leverages both batch-oriented processing and real-time processing. LivePerson considered NoSQL databases from Couchbase, MongoDB, and DataStax. However, only Couchbase Server was able to meet their high throughput requirements.
Big Data Central is a place for the big data community to explore use cases, technologies and architectures. Discover how Couchbase customers such as LivePerson, AOL and PayPal are leveraging NoSQL and Hadoop in big data solutions, classic and modern.
I read this post expecting to see some sort of intelligent analysis. Instead what I see is marketing hype and innuendo.
Maybe you should talk about indexes being built by Map Reduce jobs that return stale data after updates until the batch job runs. Or how about append only storage engine? Really? High volume updates on large documents crush Couchbase.
If you want to see a server meltdown simply start hammering a Couchbase server with updates. Then you can enjoy the fact that all your reads are now inconsistent with reality and the server storage gets hot enough to fry eggs on.
Thanks for taking the time to comment. I will try to address your concerns.
This is a high level piece, but it underscores the importance of scalability and performance. This is why we\’ve demonstrated our ability to scale and perform with benchmarks. By the way, I know MongoDB set the foundation for future performance improvements with its last release. I have high expectations for its next release.
Secondary Indexes (Views) & Consistency:
By default, views are incremental and thus eventually consistent. However, clients can use the stale flag (stale=false) to ensure they are consistent. I believe MongoDB enforces data consistency in a similar manner via write concerns. It\’s eventually consistent by default (acknowledged), but \”majority\” is an alternative.
Yes, we use append-only files. I think it\’s safe to say that modern databases are moving away from update-in-place. This includes read-only databases, columnar databases and Google Dremel / Cloudera Impala + Parquet. These are some of the reasons why.
Consistent Performance. It\’s predictable whereas update-in-place is not.
Corruption Resiliency. The ability to restore to a previous version.
Solid State Drives. They are \”read-modify-write\”. They are not update-in-place.
Low Fragmentation. It is not a problem when the update size > the original size.
I\’m not sure why you think updates are an issue. With an append-only file, a write is a write. I\’d encourage you to take a look at the benchmarks. If you are aware of any benchmarks endorsed by MongoBD, it would be great if you could share them. I would love to take a look.
MongoDB & Write Concerns
Update-in-Place versus Append-Only
All the NoSQL platforms have a long ways to go. Some are further ahead than others and there is more to a database than raw performance. Traditional relational platforms are preferred for the majority of business applications because of the depth of functionality they provide just as much as they are for performance.
It is in the area of functional depth that we are starting to see separation in the market between NoSQL platforms as the early majority players start to make their bets.
I am sure Couchbase will evolve as a platform, as will MongoDB. My initial comment was intended to point out that the draw to the blog post inferred a very informed opinion piece at the very least yet the content fell well short. Your followup comment was probably more in line with what I expected to read in the first place, but links to old blog posts that are likely not as relevant as they were when they were written is a rather interesting substitution.