The Demands of Your Database are Changing
NoSQL technology was pioneered by leading internet companies — including Google, Facebook, Amazon, and LinkedIn — to overcome the limitations of 40-year-old relational database technology for use with modern web applications. Today, enterprises are adopting NoSQL for a growing number of uses cases, a choice that is driven by four interrelated megatrends: Big Users, Big Data, the Internet of Things, and Cloud Computing.
Not long ago, 1,000 daily users of an application was a lot of users, and 10,000 was an extreme case.
Today, nearly 3 billion people are connected to the internet and the amount of time they spend online — about 35 billion hours a month in 2014 — is steadily growing, creating an explosion in the number of concurrent users.
It’s not uncommon for apps to have millions of different users a day, and must support global users 24 hours a day, 365 days a year.
Supporting large numbers of concurrent users is important, but because app usage requirements are hard to predict, it’s just as important to dynamically support rapidly growing (or shrinking) numbers of concurrent users. Reasons for this user fluctuation can include:
- A newly launched app that goes viral, growing from zero to a million users overnight.
- The fact that some users are active frequently, while others use an app a few times, never to return.
- Seasonal swings like those around Christmas or Valentine’s Day can create spikes for short periods.
- New product releases or promotions can spawn dramatically higher application usage.
Large numbers of users combined with the dynamic nature of usage patterns is driving the need for more easily scalable database technology. With relational technologies, many application developers find it difficult, or even impossible, to get the dynamic scalability and level of scale they need while also maintaining the performance users demand.
Many are turning to NoSQL for help.
Explosive growth in internet usage, in addition to the use of mobile and social apps, and machine-to-machine communications, is driving the “big data” revolution. Research firm IDC estimates that in 2013 the combined size of the world’s digital data was 4.4 zettabytes — i.e. 4.4 trillion gigabytes — and that by 2020 it will grow ten times to 44 zettabytes.
Data is becoming easier to capture and access through third parties such as Facebook and Dun and Bradstreet. Personal user information, geolocation data, social graphs, user-generated content, machine logging data, and sensor-generated data are just a few examples of the ever-expanding array of data being captured.
It’s not surprising that developers find increasing value in leveraging this data to enrich existing applications and create new apps. The availability of this data is rapidly changing the nature of communication, shopping, advertising, entertainment, and relationship management. Apps that don’t leverage it will quickly fall behind.
However, capturing and using big data requires a very different type of database.
Developers want a highly flexible solution that easily accommodates any new type of data they choose to work with, and isn’t disrupted by content structure changes from third-party data providers. Much of the new data is unstructured and semi-structured, so developers also need a database that can efficiently store it.
Unfortunately, the rigidly defined schema-based approach used by relational databases makes it impossible to quickly incorporate new types of data and is a poor fit for unstructured and semi-structured data.
Finally, with the rising importance of processing data, developers are increasingly frustrated with the “impedance mismatch” between the object-oriented approach they use to write applications and the schema-based structure of a relational database.
NoSQL provides a much more flexible, schemaless data model that better maps to an application’s data organization and simplifies the interaction between the application and the database, resulting in less code to write, debug, and maintain.
The Internet of Things
The volume of machine-generated data — a major contributor to the growth of big data — is increasing with the proliferation of digital telemetry and the “Internet of Things.”
Today, some 20 billion devices are connected to the internet— everything from tablets to home appliances, to systems installed in cars, hospitals, and warehouses. These devices receive data on environment, location, movement, temperature, weather, and more from their 50 billion sensors.
According to research firm IDC, by 2020
- 32 Billion things will be connected to the internet
- 10% all data will be generated by embedded systems (vs 2% today)
- 21% of the most valuable, “target rich” data will be generated by embedded systems (vs 8% today)
Innovative enterprises are leveraging the Internet of Things to develop new products and services, reduce costs and time to market, increase efficiency, eliminate waste, and boost customer satisfaction. This ability to access global, operational data in real-time enables dynamic, informed decision-making and increases business agility.
However, telemetry data — which is semi-structured and continuous — poses a challenge for relational databases, which require a fixed schema and structured data.
To overcome these challenges, innovative enterprises are relying on NoSQL technology to scale concurrent data access to millions of connected devices and systems, store billions of data points, and meet the performance requirements of mission-critical infrastructure and operations.
Today, most new applications run in a public, private, or hybrid cloud, support large numbers of users, and use a three-tier internet architecture.
In the three-tier architecture, applications are accessed through a web browser or mobile app that is connected to the internet. In the cloud, a load balancer directs the incoming traffic to a scale-out tier of web/application servers that process the logic of the application.
The scale-out architecture at the web/application tier works beautifully. For every 10,000 (or however many) new concurrent users, you simply add another commodity server to the web application tier to absorb the load. At the database tier, relational databases were originally the popular choice. Their use is increasingly problematic however, because they are a centralized, share-everything technology that scales up rather than out. This makes them a poor fit for applications that require easy and dynamic scalability.
NoSQL databases are built from the ground up to be distributed, scale-out technologies and are therefore a better fit with the highly distributed nature of the three-tier internet architecture.
So, Should You Adopt NoSQL, or Adapt Your RDBMS?
NoSQL's More Flexible Data Model
Relational and NoSQL data models are very different. The relational model takes data and separates it into many interrelated tables that contain rows and columns. Tables reference each other through foreign keys that are stored in columns as well.
When looking up data, the desired information has to be collected from many tables (often hundreds in today’s enterprise applications) and combined before it can be provided to the application. Similarly, when writing data, the write needs to be coordinated and performed on many tables.
NoSQL databases have a very different model. For example, a document-oriented NoSQL database takes the data you want to store and aggregates it into documents using the JSON format.
Each JSON document can be thought of as an object used by your application. A JSON document might take all the data stored in a row that spans 20 tables of a relational database and aggregate it into a single document/object.
Aggregating this information may lead to duplication, but since storage is no longer cost prohibitive, the resulting data model’s flexibility, efficiency in distributing the resulting documents, and read and write performance improvement, make it an easy trade-off for web-based applications.
Another major difference is that relational technologies have rigid schemas while NoSQL models are schemaless.
Relational technology requires strict definition of a schema prior to storing any data into a database. Changing the schema once data is inserted is extremely disruptive and frequently avoided, which is a problem in the Big Data era when application developers need to constantly and rapidly incorporate new types of data to enrich their apps.
In comparison, document databases are schemaless, allowing you to freely add fields to JSON documents without having to first define changes. The format of the data being inserted can be changed at any time, without application disruption.
Scalability and Performance Advantages
To deal with the increase in concurrent users and the volume of data, applications and their underlying databases need to scale using one of two choices: scale up or scale out.
Scaling up implies a centralized approach that relies on bigger and bigger servers. Scaling out implies a distributed approach that leverages many standard, commodity (physical or virtual) servers.
The Limits to Scaling Up with Relational Technology
At the web/application tier of the three-tier internet architecture, a scale out approach has been the default for many years and has worked extremely well.
As more people use an application, more commodity servers are added to the web/application tier, performance is maintained by distributing load across an increased number of servers, and the cost scales linearly with the number of users.
Prior to NoSQL databases, the default scaling approach at the database tier was to scale up. This was dictated by the fundamentally centralized, shared-everything architecture of relational database technology.
To support more concurrent users and store more data, you needed a bigger server with more CPUs, more memory, and more disk storage to keep all the tables. Big servers tend to be highly complex, proprietary, and disproportionately expensive, unlike the low-cost, commodity hardware typically used so effectively at the web/ application server tier.
Also, with relational database technology, at some point the capacity of even the biggest server can be outstripped as users and data requirements continue to grow. At that point, the relational database cannot scale further and must be split across two or more servers. This introduces enormous complexities for both application development and database administration due to the inherent limitations of relational database architecture.
The Advantages of Scaling Out with NoSQL
NoSQL databases were developed from the ground up to be distributed, scale out databases that use a cluster of standard, physical or virtual servers to store data and support database operations.
To scale, additional servers are joined to the cluster, and the data and database operations are spread across the larger cluster. Since commodity servers are expected to fail from time-to-time, NoSQL databases are built to tolerate and recover from such failures, making them highly resilient.
NoSQL databases provide a much easier, linear approach to database scaling. If 10,000 new users start using your application, simply add another database server to your cluster. Add 10,000 more users and add another server.
There’s no need to modify the application as you scale since the application always sees a single (distributed) database.
At scale, a distributed scale out approach also usually ends up being less expensive than the scale up alternative. This is because large, complex, fault tolerant servers are expensive to design, build and support.
Licensing costs of commercial relational databases can also be prohibitive because they are priced with a single server in mind. NoSQL databases on the other hand are generally open source, priced to operate on a cluster of servers, and relatively inexpensive.
While implementations differ, NoSQL databases share some characteristics with respect to scaling and performance:
Auto-sharding: A NoSQL database automatically spreads data across servers, without requiring applications to participate. Servers can be added or removed from the data layer without application downtime, with data (and I/O) automatically spread across the servers. Most NoSQL databases also support data replication, storing multiple copies of data across the cluster and even across data centers, to ensure high availability and support disaster recovery.
A properly managed NoSQL database system should never need to be taken offline, for any reason, supporting 24x365 continuous operation of applications.
- Distributed query support: “Sharding” a relational database can reduce or eliminate the ability to perform complex data queries. NoSQL database systems retain their full query expressive power even when distributed across hundreds of servers.
- Integrated caching: To reduce latency and increase sustained data throughput, advanced NoSQL database technologies transparently cache data in system memory.
This behavior is transparent to the application developer and the operations team, in contrast with relational technology where a caching tier is usually a separate infrastructure tier that must be explicitly managed by the ops team.
Choose NoSQL For Better Performance, Scalability, & Flexibility
Application needs have been changing dramatically due in large part to four megatrends:
- The growing number of users that applications must support (along with elevated user expectations for how applications should perform).
- An increase in the volume and variety of data available.
- A proliferation of machine-generated data from the Internet of Things.
- A shift to cloud computing, which relies on a distributed three-tier internet architecture.
As a result, the use of NoSQL technology is increasing among internet companies and enterprises because it offers data management capabilities that meet the needs of modern applications, including:
- Better application development productivity through a more flexible data model.
- The ability to scale out dynamically and cost effectively to support more users and big data.
- Improved performance that satisfies user expectations for highly responsive applications and allows more complex processing of data.
NoSQL is increasingly considered a viable alternative to relational databases, and should be considered particularly for interactive web and mobile applications.