Couchbase Analytics

Modern Big Data > Hadoop

That’s right. A modern big data solution requires more than Hadoop. Welcome to the data, it’s all big and fast.

Welcome to Big Data Central

Discuss on Hacker News

Discuss on Reddit

I’m excited to announce that Big Data Central is live!

It represents my big data story for Couchbase. It’s about the role of NoSQL databases in a world of big data.

There was a time when big data was Hadoop. It was offline analytics. That’s no longer the case. It’s a solution. It’s a solution that includes Hadoop but is not Hadoop. It’s a solution that meets both real-time analytical requirements and offline analytical requirements. It’s a solution that meets both analytical requirements and operational requirements.

The big data ecosystem now includes Storm for real-time processing, Couchbase Server for high performance data access, Hadoop for offline analytics, and more!

There are three big data challenges:

  1. The amount of data being generated, data volume.
  2. The rate at which data is being generated, data velocity.
  3. The rate at which information must be generated, information velocity.

Hadoop addresses data volume. It can store and process a lot of data, later. It scales out to store and process more data. Hadoop does not address data velocity. However, it meets offline analytical requirements.

Couchbase Server addresses data velocity. It is a high performance NoSQL database that can store a lot of data, now. It scales out to store a lot of data, faster. Couchbase Server does not address information velocity. It can store and process data at rest. However, it meets operational requirements.

Storm addresses information velocity. It can process a real-time stream of data. It scales out to process streams of data, faster. Storm does not address volume or data velocity. It does not store data. It processes data in motion. However, it meets real-time analytical requirements.

All three big data challenges can be met by integrating Storm, Couchbase Server, and Hadoop. By integrating Couchbase Server with storm, a real-time stream of data can be processed and stored. By integrating Couchbase Server with Hadoop, a lot of data can be processed offline.

Share this article

Author

Shane K Johnson was the Director of Product Marketing at Couchbase. Prior to Couchbase, he occupied various roles in developing and evangelism with a background in Java and distributed systems. He has consulted with organizations in the financial, retail, telecommunications, and media industries to draft and implement architectures that relied on distributed systems for data and analysis.

5개의 응답

  1. DataH 아바타
    DataH

    Shane, very nice article on Big Data. With the explosion of big data, companies are faced with data challenges in three different areas. First, you know the type of results you want from your data but it’s computationally difficult to obtain. Second, you know the questions to ask but struggle with the answers and need to do data mining to help find those answers. And third is in the area of data exploration where you need to reveal the unknowns and look through the data for patterns and hidden relationships. The open source HPCC Systems big data processing platform can help companies with these challenges by deriving insights from massive data sets quick and simple. Designed by data scientists, it is a complete integrated solution from data ingestion and data processing to data delivery. Their built-in Machine Learning Library and Matrix processing algorithms can assist with business intelligence and predictive analytics. More at https://hpccsystems.com

  2. Diwakar 아바타
    Diwakar

    Yes, to understand modern Hadoop, everyone need to learn Apache Storm, Spark, MapReduce, hbase etc.

    Apache Storm is an open source engine which can process data in realtime using its distributed architecture. Storm is simple and flexible. It can be used with any programming language of your choice.

    Let’s look at the various components of a Storm Cluster:

    1 – Nimbus node. The master node (Similar to JobTracker)

    2 – Supervisor nodes. Starts/stops workers & communicates with Nimbus through Zookeeper

    3 – ZooKeeper nodes. Coordinates the Storm cluster

    Both Spark and Storm can operate in a Hadoop cluster and access Hadoop storage. Storm-YARN is Yahoo’s open source implementation of Storm and Hadoop convergence. Spark is providing native integration for Hadoop. Integration with Hadoop is achieved through YARN (NextGen MapReduce). Integrating real time analytics with Hadoop based systems allows for better utilization of cluster resources through computational elasticity and being in the same cluster means that network transfers can be minimal.

    I can’t share complete information related to hadoop, spark anh storm so please visit below lnks for informative tutorials.

    For topics to learn or understand:- https://intellipaat.com/hadoop-

    For YouTube Tutorials :- https://www.youtube.com/user/i

  3. huber 아바타
    huber

    Thanks for Sharing such a Wonderful Information….
    Learn Hadoop through Online for Details Please go through the Link
    https://www.leadonlinetraining….

  4. MindsMapped Consulting 아바타
    MindsMapped Consulting

    Good article. Love to read three challenges. https://www.mindsmapped.com/big

댓글 남기기

Ready to get Started with Couchbase Capella?

Start building

Check out our developer portal to explore NoSQL, browse resources, and get started with tutorials.

Use Capella free

Get hands-on with Couchbase in just a few clicks. Capella DBaaS is the easiest and fastest way to get started.

Get in touch

Want to learn more about Couchbase offerings? Let us help.