Couchbase, a document database, allows great flexibility in storing different types of documents in a single bucket (bucket being the equivalent of a database). There is a frequent need to refer to documents of a similar type together e.g. an apparel retailer may want to separate out all clothes from all shoes. They can do this today with Couchbase by using key prefixes or type fields, but it does make the application more cumbersome.
Having containerization of similar items at the database layer not only makes the application simpler but allows for efficiencies in data processing at its lowest levels. Further, having additional levels of containment under buckets allow for access control at a finer granularity than buckets.
This opens the door for having a more scalable multi-tenant platform with Couchbase than using buckets would allow. It is with these goals that we developed the feature referred to as ‘Collections’.
Couchbase Server 6.5 makes available a Developer Preview of Collections.
In this blog, I will describe at a high level what collections are, what use cases they enable, and the functionality they provide. Sample code is shown in this blog post by Johan.
Note: A Developer Preview feature cannot be used in production. Read detailed guidelines regarding Developer Preview here: Developer Preview Documentation.
What are Collections?
They are logical data containers inside a Couchbase bucket that group similar data, just like a ‘Table’ does in a relational database.
There is also another level available for data organization called ‘Scope’ similar to a ‘Schema’ in a relational database. The namespace within each scope is independent of others, hence you can have the same collection names in different scopes. Similarly, document keys need to be unique only within a collection and hence documents with the same key can exist in different collections.
With this new introduction, role-based access controls can now be applied at the cluster, bucket, scope and collection levels.
Note: The Developer Preview does not have the scope and collection-level RBAC but it will be available with the production version in Couchbase 7.0.
For a seamless upgrade, and for backward compatibility, every bucket has a ‘_default’ scope and the ‘_default’ scope has a ‘_default’ collection. The _default collection provides backward compatibility as a direct reference to the bucket will automatically map to the _default collection. Also, on upgrade, all existing data will automatically go to the _default collection.
While the _default collection is provided as a backward compatibility mechanism, it is recommended that new applications should be written using a named collection.
Simplified Data Organization with Collections
As mentioned earlier, the new logical groupings enable better data organization, similar to tables in a relational database.
- Easier mapping of relational schemas to Couchbase by creating a collection for a corresponding relational table.
- Ability to refer to similar documents as a unit for various purposes such as building an index, setting up replication, querying, backup/restore, etc.
- More scalable indexing as the data service has to only send the documents for the collection rather than the indexer receiving documents for the whole bucket and filtering them.
- Easier to write N1QL – statements can access a collection as a table directly instead of having to dynamically construct them using an attribute for the type of the document.
For example, compare queries with and without collections:
SELECT * FROM products WHERE type = ‘clothes’;
SELECT * FROM products.clothes;
Running Multi-tenant Applications with Collections
Multi-tenant applications require varying levels of isolation between tenants and varying levels of resource sharing of the underlying infrastructure.
Within Couchbase today:
- complete physical, security and logical isolation is achieved by deploying separate clusters but provides the least sharing of resources
- security and logical isolation is achieved with multiple buckets per cluster but has its own limits in terms of overhead-per-bucket
- multiple tenants placed in a single bucket provides the best sharing of resources but requires the application to handle any security or logical isolation.
With the introduction of collections (and grouping them into scopes), Couchbase can provide security and logical isolation at more granular levels within a bucket. You can have thousands of groupings in a single bucket hence enabling you to host thousands of tenants in a single cluster. In contrast, the number of buckets that can be hosted in a single cluster is limited (note this limit has increased to 30 in Couchbase Server 6.5 with appropriate sizing), and often not enough for the needs of multi-tenant applications.
Consolidating Microservices with Collections
Modern applications are often written as a suite of microservices and a single application can be comprised of 100s of microservices. While using a bucket or even cluster per microservice is still an option, collections (and scopes) provide a more scalable alternative to consolidate more microservices into a single Couchbase cluster.
Multi-tenancy and microservice-based architecture are not mutually exclusive. Many multi-tenant applications are written using a microservices architecture. With buckets, scopes, and collections, now you have many levels of containment available to you and this gives you flexibility how you want to map tenants, microservices and tables.
Functionality availability in the Developer Preview
Once you have turned on the Developer Preview switch in a Couchbase 6.5 cluster (Developer Preview Documentation), you can start using collections and scopes. A few of the key scope and collection features of the DP functionality include:
- Support for them in both Ephemeral and Couchbase buckets
- All Couchbase SDKs support DDL and CRUD operations on them
- You can create and drop them – from the SDK, REST API or couchbase-cli.
- You can perform all CRUD operations on a collection (including subdoc).
- The item count of each collection is available with cbstats.
- DCP protocol is enhanced to stream a single scope or a single collection (in addition to the existing ability to stream a single bucket).
Note: DP is primarily for Key-Value access. RBAC will be available later. Integration with XDCR, Indexing and N1QL, Eventing, Analytics and Mobile will preview later.
Here are some resources for you to start using the new Developer Preview features: