Data migration is full of challenges.

While migrating from a relational database (RDBMS) like SQL Server or Oracle, your main challenge comes at the time of denormalizing migrated documents as per the target data models. With the introduction of scopes and collections in the Couchbase Server 7.0 release it’s easy to categorize and organize documents within a bucket.

There are many options available to migrate data from SQL Server or Oracle into Couchbase, including:

In most of the cases cbimport itself is sufficient to complete the data migration from an Oracle or SQL Server database to Couchbase because cbimport can import documents into a required scope and collection

In this article, I’m going to cover how you can use N1QL queries in Couchbase 7.0 to merge/denormalize imported documents from multiple collections within a scope.

Background Information

First, let’s review the parallels between RDBMS entities and their equivalent entities in Couchbase Server 7.0:

RDBMS Entity Couchbase Equivalent
Database Bucket
Schema Scope
Table Collection
Row Document
Column Attribute

Data migration from an RDBMS into Couchbase can have two variants:

  1. One-to-One mapping: Each RDBMS table is imported into a collection in Couchbase and each row of that table becomes a JSON document. This is the simplest migration as it does not require any denormalization and the whole process can be completed using cbimport only.
  2. Many-to-One Mapping: Multiple tables from an RDBMS database combine into a single Couchbase document, or we can say normalized RDBMS tables get denormalized into Couchbase documents. To achieve denormalization, we have multiple options, such as:
    • Denormalization at source: This means exporting data from Oracle/SQL Server in JSON format and the structure of that JSON matches the target Couchbase data model. Then you import that exported JSON document into Couchbase using cbimport. However, this isn’t possible all the time, because you have to work on some complex queries to generate your target data model. Moreover, JSON export is not always supported by all databases.
    • Denormalization at middle layer: This is possible only if you are using some custom code (any programming language like C#, Java, etc.) to move data from RDBMS to Couchbase. This way, your code connects to the source database, reads data from source, modifies data as per the target Couchbase data model, and then writes data into Couchbase. However, this requires a lot of development work as you must write code for each target data model.
    • Denormalization at destination: In this option, you move data from RDBMS to Couchbase either using cbimport or by other means. Then you write some custom N1QL queries to perform denormalization at the Couchbase level. This is a suitable option in case the target data model has a maximum of three to four levels of child nesting. You can go for this option even if your nesting is more than three or four levels, but that introduces more complexity.

Walkthrough Example: Migrating Data from RDBMS to Couchbase with Denormalization at Destination Using N1QL

To demonstrate this migration and denormalization activity, I have created a sample database in SQL Server which contains five tables names: [Customer], [Address], [Order], [OrderDetails], and [ProductDetails]. Below is the database diagram to showcase relationships and column details:

An ecommerce database diagram example for Microsoft SQL Server

Below is the sample data loaded in each table:

select * from [dbo].[Customer] Customer sample data
select * from [dbo].[Address] Address sample data
select * from [dbo].[ProductDetails] Product details sample data
select * from [dbo].[Order] Order sample data
select * from [dbo].[OrderDetails] Order details sample data

Couchbase 7.0 Setup: I have installed Couchbase 7.0 on a Windows machine and my test cluster is ready. I have created one bucket named testBucket. Further, I have created a scope named dbo corresponding to the RDBMS schema and then created five collections in the dbo scope. Here I’m keeping Collection names the same as the RDBMS table names. However, that’s not mandatory; scope and collection names can be anything of your choice.

Next, I import all RDBMS tables into this scope (dbo) using cbimport.

Using the cbimport tool to import data from RDBMS to Couchbase

Let’s prepare for data import into Couchbase:

Step 1: Export table data into CSVs using the SQL Server export wizard.

SQL Server Export Wizard for CSV data

Step 2: Import documents into Couchbase using cbimport:

Open Windows Command prompt and redirect to C:\Program Files\Couchbase\Server\bin and run cbimport command. Please refer to this documentation for more information about cbimport syntax. Below is the example to import the Customer.csv file which is placed at path D:/CSVs into the dbo scope and Customer Collection. Here the document key is Customer Id picked from the CSV Column name CustomerId.

CSV data import to Couchbase using Windows

Similarly, we can import Address, ProductDetail, Order and OrderDetail CSVs into corresponding collections within the dbo scope. At this time, we have five collections in dbo scope, and each collection has documents in it (highlighted below).

Data import for scopes and collections using the Couchbase Web Console

Step 3: Let’s verify the imported data. We can do this step either by looking at the document via the UI or we can verify document structure using N1QL. I have created five primary indexes for each collection to run a N1QL query for document verification. Verification can be done by matching the number of imported documents as well as document structure, attributes and data.

Data verfication for indexes using a N1QL query

Note: To run a N1QL query, there must be some index present (primary or secondary) for that collection.

Below is the result of the N1QL query for each collection (I changed the query result view to “table”):

Select c.* from testBucket.dbo.Customer c N1QL query sample with customer data
Select a.* from testBucket.dbo.Address a N1QL query sample with address data
Select p.* from testBucket.dbo.ProductDetails N1QL query sample with product details data
Select o.* from testBucket.dbo.Order o N1QL query sample with order data
Select od.* from testBucket.dbo.OrderDetails N1QL query sample with order details data

Documents are imported successfully, but this is a one-to-one mapping, i.e., each table is imported as a collection. For tables where we need only one-to-one mapping (e.g., ProductDetails); data migration is complete at this step.

However, for tables where we need to perform denormalization there are a few more steps before we get our final data model. For example, RDBMS tables Customer, Address, Order and OrderDetails must be merged into a single document. Your target data model should look something like:

Couchbase document database data model example

Step 4: For denormalization at the target destination (e.g., Couchbase Server), we’ll follow a bottom-up approach. First we’ll merge OrderDetails into Order, then we’ll merge Order and Address into Customer.

Let’s prepare a N1QL query to merge OrderDetails into an Order document. For this, we’ll use group by and ARRAY_AGG and the query will return an array of order details grouped by Order Id.

N1QL query with merge statement, limit clause and group by clause
Note: You can use limit keyword after group by to restrict result size for faster execution on huge data.

N1QL query results for a merge statement with limit and group by clauses

Now we have the orderDetails array ready. It’s time to merge it into an Order document. We’ll use the Couchbase MERGE statement to perform this operation. Here we’re adding a new attribute named orderDetail in every Order document when Order.OrderId and OrderDetails.OrderId are matched.

An example N1QL query merge statement for Order detail to Order

Before running this query, let’s have a look at the indexes we need for it. Our previous queries should work with the primary index, but for the merge part, we must create a secondary index. Don’t worry the Couchbase index advisor will help us.

Copy the above query in the Query Workbench and click on the “Index Advisor” button. It will automatically show the index creation statement that’s needed to run this query.

The Couchbase Query Workbench Index Advisor

I am taking a reference from the index advisor and appending a WHERE clause to create a partial index on the Order collection.

Creating a partial index using a where clause in Couchbase

This partial index covers only those Order documents where the orderDetail attribute is missing. In case we have millions of Order documents, the number of documents in this index will start being reduced when we run merge statements in chunks (using the LIMIT keyword). For example, at this point we have six documents covered by this index.

Creating a secondary index on the OrderDetail collection is similar, as shown below.

Create a secondary index for order details example in Couchbase

Now we have two indexes in a ready state.

Two indexes that are ready in Couchbase

Next, let’s execute a merge statement with a limit clause. I have updated the where clause in the subquery (where od.OrderId is not missing) to select the appropriate index during execution.

A N1QL merge statement with a limit clause and where clause subquery example

This statement will update one Order document and append its associated orderDetails as a child document. In my case, it picked the Order document with orderId=2 and updated it. Order Id 2 has only one orderDetails document, so it added it as an array.

An example of a merged document array in Couchbase

Let’s have a look at the index. The document count decreased from six to five. This is because one document is not satisfying the partial index’s condition. This index will be updated as order documents are updated.

An example index and partial index update view in Couchbase

Now remove the Limit clause and run a merge query to update all order documents.

a N1QL statement for merge query remove limit clause to update documents

This statement brings the adv_OrderId index document count to zero as all Order documents now have an attribute orderDetail.

Index document count example in Couchbase

I randomly picked one order document (orderId=9) and included the output below.

An example JSON document in Couchbase

Now we have an Order document ready with its child to merge into Customer. Let’s modify our merge query accordingly and create appropriate indexes for query execution:

The index for inner query/subquery:

A create index inner query subquery N1QL example

The partial index for Customer collection:

Creating a partial index for a collection N1QL example

The merge query to merge Orders into Customer:

A merge query N1QL example for a customer order

After execution, this is the final customer document structure (randomly picked):

A customer document structure JSON example in Couchbase Server

Up to this point, we have achieved denormalization of the order, orderDetail and Customer tables. Now let’s merge address as an array into Customer as we did the others.

Creating the index for the Customer collection:

Create index for a collection example in Couchbase

Creating the index for the address collection:

Create an index for a Collection using sample address data in Couchbase Server

The merge query to merge address into Customer:

A N1QL statement to merge query addresses to customers in Couchbase

After executing this query, we will get our final data model which is made up of four RDBMS tables.

    • Customer will be the parent document.
    • Address will be an array of objects embedded in customer.
    • Order will be an array of objects embedded in customer.
    • OrderDetails will be an array of objects embedded in each order object.

Final structure of the customer document:

An ecommerce example JSON document final structure in Couchbase database

Finally, we’ve achieved denormalization as per our target data model using N1QL queries.

We can modify the sub-query to select a number of attributes/columns instead of *. Furthermore, we can use Couchbase object functions to add/remove any attribute in the subquery/parent result.

Step 5 (optional): Now it’s time to clean up. Since we have our target data model with us in the Customer collection, we can delete the other three collections (address, order and orderDetails).

Conclusion

In summary, denormalization at destination can be achieved using the N1QL query language along with appropriate indexes. It can be used for any level of nesting as per your requirement starting with the bottom-most child data in your data model.

Take Couchbase for a test drive: Download Couchbase Server 7.0 today

 

Author

Author

Posted by Gurmeet Singh

Gurmeet is working as a Technology Lead at Infosys and has more than 9 years of experience in full-stack development including front-end, backend and databases. Gurmeet holds a Bachelor's Degree in Computer Science from Punjab Technical University, India. See his LinkedIn profile for more details: https://www.linkedin.com/in/gurmeet-singh-04281381/

Leave a reply