Capella’s new columnar service adds real-time data analysis to help organizations build modern, real-time adaptive applications.
Today, at AWS re:Invent, Couchbase announced the Capella columnar service, a new, real-time data analytics service to our cloud data platform to enable real-time, adaptive applications.
Capella columnar is a service that introduces a columnar store and extensive data integration within our database-as-a-service (DBaaS), thereby allowing for real-time data analysis on the same platform as operational application workloads. By converging operational and real-time analytic applications into one database platform, Couchbase removes friction to deliver premium customer experiences, especially those that incorporate artificial intelligence. We call these experiences adaptive applications because they offer contextualized, hyper-personalization, enabled by real-time analytic calculations, among other features.
We realized that true real-time data analytics can only be performed if the calculations are processed simultaneously and alongside the running application. And, they can only become useful if these real-time analytic results are immediately written back to the operational database and the applications it serves. This kind of reconciliation between real-time analytics and operational applications has confounded the database industry for decades.
Why is operationalizing Real-time data analytics so difficult?
According to Forrester, 73% of data in the enterprise is NOT used in analytics, and 40% of big data projects FAILED to deliver value. Why? Because it takes too long to turn any analytic insight into an operational action especially when both the analytic measure and operational application are powered by multiple relational and NoSQL databases, complicated ETL, and impossible to understand integration points.
Real-time data analysis requires a lot of data
Analytics often consume large volumes of data which is both expensive to persist in storage and to aggregate computations. Calculations take too long when querying and aggregating large volumes of data, and row scans are expensive. Addressing these volume and latency problems is how the analytics industry got started.
- Analytic systems adopted OLAP “cubes” that store pre-calculated measures to avoid the expense of repeatedly running calculations.
- This is why analytic systems also adopted a more expeditious columnar storage model that allowed for querying only the specific data dimensions that were desired, rather than scanning every row across every column.
- This is also why modern analytic systems scale computation resources separately from storage resources.
- Note that Big Data was a centralizing force, as well as cloud computing, and streaming analytics, which try to shrink data’s volume problem and make it manageable.
Yet all of this has failed to address the ability to automatically operationalize analytic measures inside applications, in order to make the applications immediately smarter.
How do you make real-time data analytics actionable?
Real-time analytic measures (aka “Insights”) are still too difficult to operationalize within enterprise applications. When I worked for a business intelligence tool vendor, we used to say that we made, Actionable insights with our interactive dashboards. But actionable insights are still a myth because there was no way to incorporate analytic measures, derived from large and diverse amounts of enterprise data, inside applications without human intervention. It always required a person to push a button.
Most analytical systems do not write-back the derived data values they calculate into operational systems. They usually only present their outputs as “dashboards”, which are great for viewing, but difficult to expedite action-taking. The delay in taking action is usually minutes to days long. This is the write-back latency gap and it is a 50-year old problem. This is why analytics is almost never a “real-time” activity like a database transaction is.
Couchbase Columnar will eliminate the write-back latency gap for operational applications.
Our Capella columnar service will extend the real-time analytic capabilities available from Couchbase. The Columnar service adds the following to Capella:
- A new column-oriented, Log-Structured Merge (LSM) plus B-tree structured storage engine built to expand the analytic performance and capacity of Capella. It offers impressive compression and high-speed column-based access to its data. This serverless engine will dynamically scale while containing terabytes of analytic data. Its data is stored in AWS S3, and separated from our computation features.
- We’ve enhanced our MPP-based computation engine, allowing for real-time calculations regardless of data size. This aggregation feature uses SQL++ for queries and includes the Couchbase patented cost-based optimizer to ensure exceptional performance even when queries are complex.
- Real-time ingestion capabilities powered by Apache Kafka, able to connect, capture, and extract data from nearly any database or application. This process also transforms the extracted data into developer-friendly JSON structures while in transit. Imagine real-time ingestion of BSON objects into Capella columnar!
- File-based reads, imports (and exports) for data stored in AWS S3 including JSON, Parquet, Avro, CSV, and other text formats.
- Conversational coding using Capella iQ, to allow developers to use natural language interactions with ChatGPT (or eventually their favorite generative AI large language model) for SQL++ development.
- Native support for Tableau and PowerBI for analytic development and visualization.
- New data APIs to read and write analytic measures back to operational applications powered by Capella. This eliminates the decades-old problem with embedding real-time data analytics into applications.
This results in the ability to execute large-scale, real-time analytic calculations that can be used as new data is available in applications powered by this and Capella’s other services such as Key/Value, Query, Search, Eventing, and mobile App Services. Customers can build and deliver adaptive applications that incorporate real-time analytic results.
What Makes Capella Columnar better for apps that need real-time analytics?
This is the first time that a single database platform has created specialized and optimized storage containers for both transactional operational data and analytic data, best stored in a columnar format. Each storage container, Magma for operational data and columnar for analytic data, is built on a Log-Structured Merge (LSM) design, which is not only fast, but offers better compression, IO throughput and resource utilization.
This is like being able to float and fly at the same time.
Doesn’t MongoDB Atlas have this, too?
No, MongoDB has announced Column Store Indexes, which is a duplicative indexing structure against the data that persists in its singular storage engine, WiredTiger. All data is still stored in WiredTiger, MongoDB’s equivalent to Couchstore. WiredTiger consumes 50% of available memory when in use, which is one of the reasons that MongoDB does not scale as efficiently as Couchbase and Capella.
Couchbase’s approach offers better performance, and better opportunities to incorporate external data for analytics without contaminating or bloating the operational data’s storage. Thus both engines work and scale their workloads independently while living in the same cluster.
What is an Adaptive Application?
An adaptive application can adjust its behavior and features in real time based on various factors, such as user preferences, environmental conditions, data inputs, or changing circumstances. The goal of adaptive applications is to provide a hyper-personalized and responsive user experience by dynamically tailoring their functionality to the specific needs and current context of the user.
Key characteristics of adaptive applications
Hyper personalization: Adaptive applications can customize their UI, content, and functionality to match the situation, preferences, and requirements of individual users. This personalization can enhance user satisfaction and productivity.
Context-awareness: Adaptive Applications can adapt based on the current context, which may include factors like location, device type, network conditions, time of day, and past and present behaviors.
For example, a navigation app can adapt its route suggestions based on real-time traffic conditions and drive times.
Learning and intelligence: Adaptive applications will incorporate predictive machine learning, artificial intelligence, real-time calculations and generative AI conversations to continuously analyze user behavior and improve their ability to adapt. They can learn from past interactions to make better predictions and immediate recommendations.
Flexibility: Adaptive applications are designed to be flexible, feature-rich, and easily configurable. Data in adaptive applications should be available in flexible formats such as JSON in order to create or modify unanticipated data inputs such as enhancing account profiles with new personalization attributes, or storing conversation prompts and responses with LLMs.
Customizability: Users may have the option to adjust settings or provide feedback to help the application adapt to their needs.
Automation: They can automate certain tasks and processes, making it easier for users to accomplish their goals without manual intervention.
Calculation: They can instigate and respond to calculated data such as real-time inventory, capacity planning and estimation, or other real-time analytic metrics. This is why we needed to add Capella columnar service.
Exceptional performance: Adaptive applications must be able to react in real time in order to avoid missing a response opportunity. Latency is the enemy of adaptive applications.
Edge and Mobile-enabled: Adaptive applications will inherently be mobile and operate at the edge because of the behaviors of their users. Mobile devices will be a key location from which adaptive application data will originate and the destination for which its data will be consumed.
Situational: Adaptive applications will anticipate and execute actions that bring data and application functionality to the user, without their having requested it.
Cross-connected: Adaptive applications will cross-connect account personalization information across user’s opted-in services, such that your bank, airline, and hotel loyalty programs will coordinate actions among themselves, like upgrading you in real time while you are crossing your threshold to “platinum” status.
For more information about Adaptive Applications, check out Forrester’s new market roadmap report, Translytical Architecture 2.0 Evolves To Support Distributed, Multimodel, And AI Capabilities.
Couchbase Capella’s columnar service will allow customers to:
Improve agility and performance. Capella columnar works within a Capella-powered application to enable fast, schemaless, ingestion without having to perform extract, transform, and load (ETL). The service can distribute data from operational workloads to perform real-time analytics on operational data and then immediately influence application behavior with that information. In addition, the separation of compute and storage means Capella columnar can rapidly scale to meet changing application or analytical needs.
Stream ingestion from enterprise data sources in real time. With Capella columnar, operational analytics are not limited to only operational data because users can include external JSON, relational, streaming, and other datasets from SaaS applications or other database management sources. Capella columnar can analyze a true variety of data in a simple, single statement. For example, it can analyze data from Couchbase, S3, BSON, Cassandra, and MySQL all in the same statement.
Increase ease of use for developers. Capella columnar uses the same SQL++ query language across operational and real-time analytic applications. This means that developers who already know SQL can easily build applications on a single platform with a single query language instead of having to use two different query languages. The new service also features natural language-powered Capella iQ as a SQL++ co-pilot for faster coding.
Reduce complexity and cost. By converging operational and real-time analytics in one data platform, customers can achieve more with Capella and with a lower total cost of ownership (TCO) instead of absorbing the cost of one database platform for operational workloads and another for near-real-time analytics. In addition, teams converting JSON data to traditional analytic databases will no longer need to go through a complex conversion process.