Interactive Software – Then and Now

The architecture of software systems has transformed to keep pace with the sea change in interactive software driven by the web. A modern web application can support millions of concurrent users by spreading load across a collection of application servers behind a load balancer.

As the table below shows, there are fundamental differences in the users, applications and underlying infrastructure between interactive software systems of the 1970s and those being built today.

Interactive Software, Then and Now

Database Architecture Has Not Kept Pace

Modern web applications are built to scale out – simply add more commodity web servers behind a load balancer to support more users. Scaling out is also a core tenet of the increasingly important cloud computing model, in which virtual machine instances can be easily added or removed to match demand.

In contrast, relational database (RDBMS) technology has not fundamentally changed in over 40 years, but remains the default choice for holding data behind many web applications. Handling more users means adding a bigger server (see illustration below). Big servers are highly complex, proprietary, and disproportionately expensive, unlike the low-cost, commodity hardware in web- and cloud-based architectures.

Logic Scales Out

Plus, RDBMS technology requires the strict definition of a “schema” prior to storing any data into the database. Want to start capturing new information you didn’t previously consider? Want to make rapid changes to application behavior requiring changes to data formats and content? With RDBMS technology, changes like these are extremely disruptive and therefore are frequently avoided - the opposite behavior desired in a rapidly-evolving business and market environment.

In an effort to address the shortcomings of RDBMS technology in modern interactive software systems, developers have adopted a number of “band aid” tactics:

Sharding

In this approach an application will implement some form of data partitioning to manually spread data across servers. While this does work to spread the load, there are undesirable consequences to the approach:

  • When you fill a shard, it is highly disruptive to re-shard.
  • You lose some of the most important benefits of the relational model.
  • You have to create and maintain a schema on every server.

Denormalizing

This approach allows the type of information being stored in the database to change without requiring an update to the schema, makes sharding much easier and allows for rapid changes in the data model. Unfortunately, just about all relational database functionality is lost in the process.

Distributed caching

This approach employs distributed caching technologies, such as Memcached, which sit in front of an RDBMS system, caching recently accessed data in memory and storing that data across any number of servers or virtual machines. Memcached and similar distributed caching technologies are useful to a point but are not a panacea and can create problems:

  • Accelerate only data reads – Memcached was designed to accelerate the reading of data by storing it in main memory, but it was not designed to permanently store data.
  • Cold cache thrash – As the application seeks but doesn’t find data in the caching tier, it is forced to read the data from the RDBMS, delaying both reads and writes, leading to application time-outs, unacceptably slow response times and user dissatisfaction.
  • Another tier to manage – Inserting another tier of infrastructure into the architecture adds more capital costs, more operational expense, more points of failure, more complexity.

The techniques used to extend the useful scope of RDBMS technology fight symptoms but not the disease itself. Sharding, denormalizing, distributed caching and other tactics all attempt to paper over one simple fact: RDBMS technology is a forced fit for modern interactive software systems.

NoSQL Databases: Matched to Modern Interactive Software

 

Because vendors of RDBMS technology have little incentive to disrupt a technology generating billions of dollars for them annually, application developers were forced to take matters into their own hands. Google (Big Table) and Amazon (Dynamo) are two leading web application developers who invented, developed and depend on their own database technologies. These “NoSQL” databases, each eschewing the relational data model, are a far better match for the needs modern interactive software systems.

Building upon the pioneering research at these and other leading-edge organizations, commercial suppliers of NoSQL database technology have emerged to offer database technology purpose-built to enable the cost-effective management of data behind modern web and mobile applications.

NoSQL versus RDBMS

While implementations differ, NoSQL database management systems share a common set of characteristics:

  • No schema required – Data can be inserted in a NoSQL DB without first defining a rigid database schema. The format of the data being inserted can be changed at any time, without application disruption. This provides immense application flexibility, which ultimately delivers substantial business flexibility.
  • Auto-sharding (sometimes called “elasticity”) – A NoSQL database (also known as a scale out database) automatically spreads data across servers, without requiring applications to participate. Servers can be added or removed from the data layer without application downtime, with data (and I/O) automatically spread across the servers. Most NoSQL databases also support data replication, storing multiple copies of data across the cluster, and even across data centers, to ensure high-availability and support disaster recovery. A properly managed NoSQL database system should never need to be taken offline, for any reason, supporting 24x7x365 continuous operation of applications.
  • Distributed query support – “Sharding” an RDBMS can reduce, or eliminate in certain cases, the ability to perform complex data queries. NoSQL database systems retain their full query expressive power even when distributed across hundreds or thousands of servers.
  • Integrated caching – To reduce latency and increase sustained data throughput, advanced NoSQL database technologies transparently cache data in system memory. This behavior is transparent to the application developer and the operations team, in contrast to RDBMS technology where a caching tier is usually a separate infrastructure tier that must be developed to, deployed on separate servers and explicitly managed by the ops team.

Mobile Devices: A New Data Synchronization Challenge

While web applications have been struggling with data management challenges for over ten years, mobile applications have exploded in popularity over the last couple of years, bringing data management challenges of their own.

Apple iOS applications best represent this new genre of interactive software system. Launched on July 10, 2008, the Apple App Store allows iPhone, iPod Touch and iPad users to browse and download applications built with the Apple iOS SDK. As of March 2011 the App Store offers over 300,000 unique applications, over 10 billion applications have been downloaded, and over $2 billion in payments have been made by Apple to the developers of these applications.

These “native” mobile applications (and similar applications available for Android, Blackberry and Windows Mobile devices) execute on the device and typically store their data on the device itself allowing operation, whether or not connected to the Internet. Unlike the data management needs of a Web application, these systems obviously do not need to support potentially millions of concurrent users and thus don’t require auto-sharding, distributed query support or caching of data (data is usually stored in high-speed non-volatile memory versus on disk media).

But these systems present a significant data synchronization challenge. Because a mobile device can be easily lost or damaged, a reliable mechanism for data backup is required. Many mobile applications also have sister web applications. A project management system, for example, may have both a native iPhone application and a web application interface. When using the native iPhone application, a user may make changes to a local copy of the data while disconnected from the Internet. When a connection is re-established, the data should be synchronized with the Web application to ensure data consistency across views.

Some NoSQL database management systems are beginning to support synchronization of data between mobile devices and database clusters deployed in a data center (or “in the cloud”). Because many web application developers are also delivering native mobile versions of their applications, this functionality is increasingly attractive to developers evaluating database alternatives.

Open Source and Commercial Database Technologies

Unlike Google and Amazon, few companies can or should build and maintain their own database technology. But the need for a new approach is nearly universal. The vast majority of new interactive software systems are web applications with the characteristics and needs described in the previous sections of this piece. These systems are being built by organizations of all sizes and across all industries. Interactive software is fundamentally changing, and the database technology used to support these systems is changing too.

A number of commercial and open source database technologies such as Couchbase, MongoDB, Cassandra, Riak and others are now available and increasingly represent the “go to” data management technology behind new interactive web applications.