Couchbase for Content and Metadata Stores

Whether you're building an online publication, a product catalog or a third-party data aggregation app, a NoSQL document database is a great match for your platform. Specifically, Couchbase Server provides a flexible data model, easy scalability, consistent high performance and full-text search integration that meet the requirements of customized content applications.

Scheduled Webinar

Couchbase Plug-in for Elasticsearch.

  Attend at 10:00am PST

The Challenges

Content and metadata stores manage and serve a variety of different data that could have extremely varied structure, change constantly, and be used by a growing number of content consumers. If you’re considering building a content and metadata store application, you might face a few challenges:  

  • Storing unstructured content and metadata. A large amount of data today is unstructured. For example, if you’re building a content catalog, you may store tens of millions of different objects – unstructured content and metadata that may be hierarchical, sparse, free-form text, or varying length. Your data model needs to allow you to quickly add new attributes, without dealing with the complexity and time it takes to change a schema, to ensure faster and more efficient development cycles.
  • Scaling quickly to support millions of users. Workloads for content and metadata stores can often be spiky and unpredictable. Your database should be able to scale out rapidly and support potentially millions of concurrent users around the world without any app changes or downtime.
  • High-performance interactive apps. It is no longer sufficient to serve only static content. Users demand dynamic content and a great user experience personalized totheir interests. This translates to two main requirements at the database layer – consistent low latency and high-throughput. For example, if your content app includes social media capabilities, your database may need to handle high-volume data ingestion.
  • Searching across your dataset. With rich content apps that store large amounts of content, you need the ability to search content in different ways. Integrating your database with a full-text engine lets your users can search across all their data based on terms or relevancy.

The Solution

To build next-generation content and metadata stores, leading organizations such as McGraw-Hill Education rely on Couchbase Server. With a combination of full-text search, real-time analytics, and indexing and querying, you can build richer and more powerful apps using Couchbase Server.

  • Manage a wide variety of data with our flexible data model. Our flexible JSON document model can accommodate varied data and metadata requirements without any changes or downtime to your app. It provides an intuitive approach that allows you to model documents that represent articles, website landing pages and other digital content like e-books, magazines and research material. Additionally, using features like indexing, querying and incremental map reduce you can do simple real-time analytics on your content and content attributes. This can be useful if you want to generate a list of recently published content or popular categories in your content store.
  • Easy scalability helps you manage spiky workloads. Content and metadata store workloads are often spiky and unpredictable. As demand on your content store grows, you need to easily scale your database without any app changes or downtime. Plus, global content consumers such as students and other digital content readers expect quick access to content across multiple devices and platforms. With us, it is easy to scale your database as needed, whether within a cluster and across multiple geographical datacenters. With a single click of a button and zero changes to your app, you can grow your cluster from 1 to 25 to 100s of servers as you see more demand. Because every server in the Couchbase cluster is identical, this architecture scales out linearly without any single point of failure. With cross datacenter replication (XDCR), you can scale across geographies and bring data closer to your users for faster access.
  • Consistent high performance ensures a great user experience. Content and metadata stores are highly interactive applications with demanding users. Today’s content consumers are no longer satisfied with simple static content; they expect to interact with it. Typically, in content and metadata stores, metadata is heavily read, while user and content stats are updated whenever a user accesses some content. At the database layer, this calls for low latency for metadata access and high-throughput for concurrent stats updates. Our built-in object managed cache provides consistent low latency 99% of the time and can handle hundreds of thousands of document operations every second. Fine-grained document-level locking boosts request throughput to support millions of concurrent users with fewer servers.
  • Make sure your content is always on, 24x365. With smart devices, people are consuming content everywhere, all the time. To make sure your content is always available to users, your platform has to run 24x365 with no downtime – not for hardware, software or anything. Couchbase allows you to do rolling upgrades to the next software version without any downtime. You can easily remove or add new servers to the cluster without affecting the availability of your app. You can do maintenance operations like backup, restore and compaction online without any downtime to your app. Even if your entire cluster goes down, your app can stay up and running using cross data center replication (XDCR), which provides active-active geo-replication between two Couchbase clusters.

Full-Text Search in Content and Metadata Stores

Couchbase integrates with Elasticsearch, a full-text search engine, to provide search functionality for your apps. The combined solution provides you with real-time distributed full-text search for your JSON documents stored in Couchbase. Elasticsearch also has a rich query domain-specific language (DSL) for simple and advanced search query features like boosting and faceted navigation. Learn more about how you can integrate Couchbase Server and Elasticsearch using the Couchbase Server Plug-in for Elasticsearch.

Reference Architecture for Content and Metadata Stores

As shown below, content and metadata stores use a combination of a database, full-text search engine and a content delivery network (CDN).

Couchbase’s flexible JSON document model, easy scalability and high performance make it a suitable for storing frequently accessed content and metadata. A full-text search engines (Elasticsearch) provides full-text search capabilities across the documents. External storage systems like Amazon S3 or a CDN are used for storing larger content like audio and video files.

Couchbase as a Content Store (Architecture)

The application uses the Couchbase Server SDKs to query Couchbase Server and the Elasticsearch query DSL APIs to query Elasticsearch. To search across all your documents, you can use the Elasticsearch query DSL. You get back document IDs for relevant documents. You directly access documents in Couchbase to retrieve document content. Larger content such as audio, video and large documents can be served to the app directly from the CDN.

The plug-in is installed on the Elasticsearch cluster and uses Couchbase’s cross datacenter replication to propagate document updates made in Couchbase in real time to Elasticsearch.