Collections are a new feature in Couchbase 6.5. They let you group similar documents within each bucket, just as tables in relational databases collect similar records within each database. Collections will be fully supported in Couchbase 7.0, but you can try them right now in the Couchbase 6.5 release as a Developer Preview feature. The Demo Application already uses them.

What Are Collections?
If you are coming from the world of relational databases, you can think of collections as tables. All the documents within a couchbase collection should be of the same type, just as all the records in a relational table are of the same type. There might be a “customer” table or a “product” table in a relational schema; similarly, there might be a “customer” collection in a Couchbase bucket.

In older versions of Couchbase, data was organized like this:

  • Cluster
    • Bucket
      • Document

In Couchbase 6.5, there are two more layers, like this:

  • Cluster
    • Bucket
      • Scope
        • Collection
          • Document

How Are Collections Useful?

Collections are the lowest level of document organization, and directly contain documents. They are useful because they let you group documents more precisely than was possible before. Rather than dumping all different types of documents (products, orders, customers) into a single bucket and distinguishing them by a type field, you can instead create a collection for each type. And when you query, you can query against the collection, not just the whole bucket. You will also eventually be able to control access at the collection level.

Scopes are the level of organization above collections. Scopes contain collections and collections contain documents. There are different ways to use scopes, depending on what the Couchbase cluster is being used for. If it is supporting many different internal applications for a company, each application should have a scope of its own. If the cluster is being used to serve many client organizations, each running its own copy of an application, each copy should have a scope of its own. Similarly, if a cluster is being used by dev groups, perhaps for testing, the unit of allocation should be a scope. In each case, the owner can then create whatever collections they want under the scope they are assigned.

Scopes have to be unique within their buckets, and collections have to be unique within their scopes. Accordingly, the “default” bucket could contain two scopes “dev” and “prod”, each with their own “products” and “customers” collections.

Use of Collections

You can see collections and scopes being used in the latest version of the Couchbase Demo Application, here:

https://github.com/couchbaselabs/try-cb-java/tree/6.5.0-branch

This application uses an existing “travel-sample” bucket as is to let the user search for flights and hotels, but stores its own user and bookings data in collections. The structure being used is this:

  • Bucket: default
    • Scope: larson-travel
      • Collection: users
      • Collection: flights

When the user creates an account, a document is created in the “users” collection. When they book flights, documents are created in the “flights” collection, and referenced in the user’s document in the “users” collection.

This design allows multiple applications to share the same bucket. If we had a second instance of the demo app, used by another travel agency, we could just create another scope (with its own “users” and “flights” collections,) and point the second instance at this scope by updating its application.properties file. The two instances would operate side by side, without interfering with each other.

Example Code

To begin with, the bucket and scope that contain user and bookings information are named in the application.properties file:

These configuration values are picked up in the Database.java file:

In User.java, we see how a new flight is registered for the user. The scope bean, created above, is passed in. The username is the id the name the user logged in with.

The id of the user document is the username of the user. The application knows to get it from the “users” collection, in the collection used by the application.

The flights the user has booked are stored in the user document, in an array named “flights”.

We add the new flights to the existing flights.

Then we store the new version of the user document.

Just below, we see how the flights of a user are retrieved.

Get the user document from the “users” collection.

Get the “flights” array from the user document. It contains a list of flight ids.

Retrieve each flight document from the “flights” collection” by ID.

Changes From Earlier Code

The code for working with users and their flights was quite different in the previous version, which didn’t use collections. There, booked flights were stored directly in the user document. The user document was stored directly in the “travel-sample” table. Here is the original code for the registerFlightForUser() function.

Notice the use of a prefix to mark the type of document. This isn’t necessary once collections are available.

We retrieve the array of flights, which is already in the document.

We add the flights to the array of booked flights.

And we store the user document.

Obviously, separating the booked flights from the user isn’t terribly compelling in a toy application. But in a production application, where we are storing many types of information of information about each user, it would make sense to store some records outside the user document, particularly any that were large or numerous or prone to changing frequently.

Scopes and Collections Documentation

To find out more about how to work with scopes and collection directly, consult this documentation, which explains the RESTful API for working with both, the relevant CLI commands, and information about collections available from cbstats.

Summary

Collections and scopes let you organize documents within a Couchbase bucket, just as tables and schemas let you organized rows within relational databases. The current Couchbase 6.5 GA release provides early, limited support for collections and scopes as a Developer Preview feature. To get started with collections and scopes, you can start working with the Java Demo Application right now.

Resources

Download

Download Couchbase Server 6.5

 Documentation

Couchbase Collections 6.5 Documentation

Couchbase Server 6.5 Release Notes

Couchbase Server 6.5 What’s New

Blogs

Introducing Collections – Developer Preview in Couchbase Server 6.5

Announcing Couchbase Server 6.5 – What’s New and Improved

All 6.5 Blogs

Author

Posted by Johan Larson

Johan Larson is a Senior Software Engineer at Couchbase. Johan's work responsibility is building an SQL-based query language for JSON data in a distributed NoSQL system.

3 Comments

  1. Hey Johan, this is a really nice feature to have specially for the applications which are implementing multi tenancy on the basis of data discriminator property. So i wanted to know one thing that either it is possible to execute the a query on multiple scopes? like for some queries, we wanted to get data from multiple scopes and for other queries, data should come back from single scope. So is it possible in couchbase 6.5 version? Thanks

  2. Hi, Malik. It is not possible to write N1QL queries against collections in 6.5. Right now, N1QL queries cannot work against collections, only against buckets, as they have in the past. N1QL for collections is coming in 7.0, probably some time in 2020.

    Once N1QL works for collections, it will be possible to write queries that span multiple scopes, just as it is now possible to write N1QL queries that span multiple buckets.

  3. Since “All the documents within a couchbase collection should be of the same type, just as all the records in a relational table are of the same type”, does it imply that all documents within a collection should have the same fields?

    Thanks

Leave a reply