Many organizations looking to modernize their legacy databases are evaluating fully managed hosted DBaaS offerings that provide flexibility, performance and scale without compromising the trusted strengths of relational databases (RDBMS). With the introduction of Couchbase Capella DBaaS, customers can now benefit from exceptional price-performance to bring down their total cost of ownership (TCO).
However, ensuring that established data integration tools and processes can be used to move data into Couchbase Capella can be an important dependency in transitioning to cloud services. This is why we are excited to introduce the support of Couchbase 7.x in Talend Big Data Platform 8.0, covering both Couchbase Server and Capella.
Talend offers a range of data management capabilities, ranging from business process management to ETL and master data management. Talend’s new Big Data Platform 8.0 release contains an updated Couchbase connector, enabling the modernization and migration of business-critical applications.
Creating an ETL workflow for Couchbase Capella
This blog shows how to connect Talend with Couchbase Capella and perform a simple ETL workflow. Talend Big Data Platform will be leveraged to export relational data from a MySQL database into Couchbase using the JSON document format.
The workflow will read from a MySQL database table and create corresponding JSON documents in Couchbase Server. You can build upon this example and augment it to support more complex scenarios involving data transformations, complex mappings, etc.
The following steps have been tested on macOS but can be adapted for other systems.
Couchbase Capella offers a free 30-day trial and allows you to create your first cluster in a few clicks in just minutes. You can start your trial with a single click.
Just enter your name, email, and organization name. Once your email is verified, you can begin your trial. Simply select your preferred AWS region and click “Start Trial”.
The trial will create a single node Couchbase cluster in AWS. Once your cluster is created, we need to configure a data bucket for this example.
Navigate to Clusters > Trial-Cluster > Buckets > Create Bucket. We will name the bucket world.
Click Next to continue, accept the defaults and click Create Bucket. Also, create the travel-sample bucket by navigating to Tools > Import and scroll down to Travel Sample and select Import.
We will now need to configure access to the cluster from a remote workstation.
Navigate to Clusters > Trial-Cluster > Connect and scroll down to Database Access > Manage Credentials, then +Create Database Credential.
Make sure to use the dropdown under Bucket Level Access and enter world, All Scopes, and Read/Write for the settings, then click Create. Repeat for the travel-sample bucket.
Allowing local IP access to Capella
Next, we need to configure access to your local workstation as we will be running Talend Studio locally. Navigate to Connect > Manage Allowed IP. From there, you can click + Add My IP.
The Couchbase Capella configuration is complete for now.
Install MySQL sample database
Next, we need to install and/or configure the MySQL server which is also running on the local workstation. I am using macOS and Homebrew to install additional packages.
To install MySQL via Homebrew, open a terminal and type: brew install mysql .
Download and import the world database:
- Download world.sql
- Start MySQL server, if not started already, and connect to server:
sudo mysql –u root –p
- Create world database:
CREATE DATABASE world;
- Load content from world database:
- Verify the world database has been imported correctly:
- The output should show three tables: City, Country and CountryLanguage
Now it is time to install Talend Studio. You can download Talend Studio 8.0.1 from Talend Cloud. Navigate to your user and select Downloads:
Select Talend Studio and your platform from the dropdown and version 8.0.1.
Note: Talend recommends Java 11. For this exercise, I am using OpenJDK-11 installed via Homebrew with:
% brew install openjdk@11
Accept the license agreement once the Talend Studio is installed and provide a license file or connection to your Talend Cloud.
Install Couchbase connector for Talend Studio
After launching Talend Studio, you will need to install the Couchbase connector to proceed. Navigate through Help to the Feature Manager and search for Couchbase.
After launching Talend Studio, navigate to Job Designs > Standard and right-click on New Job, enter the name MySQLToCouchbaseIntegration and click Finish.
This will create a panel where job artifacts can be dragged and dropped and should display the palette to the right. From the palette, drag and drop the tMyslInput widget to the left. Find the widget under Databases/MySQL.
From the palette, drag and drop tCouchbaseOutput widget (can be found under NoSQL Databases/Couchbase) to the right.
Double click on tMysqlInput on the panel; this will display the Component panel at the bottom.
Enter the MySQL database connection info.
- Click on the … button, located to the right of Table Name, expand world and select the City entry and click on the OK button.
- Remove the default query and replace with:
SELECT * FROM City
Below is an example screenshot of what it should look like:
Configure target schema
Click on the Guess schema button. This will bring up the Schema window; move and resize the window as appropriate.
Edit the schema with the following changes:
- Check Nullable for ID
- Check Nullable for Population
- Change ID type to String
Set DB types to the following and click on OK:
- INT for ID
- VARCHAR for fields Name, CountryCode, and District
- INT for Population
This screenshot shows all these schema settings:
Left-click on tMysqlInput and then right-click to select Row/Main and drag an arrow from the tMysqlInput component to tCouchbaseOutput. An arrow should be created between the two.
We will now navigate back to the Couchbase Capella console to get our connection endpoint. Navigate to Clusters > Trial Cluster > Connections and copy the Wide Area Network endpoint address:
Left-click on tCouchbaseOutput on the panel; this will display the Component panel at the bottom.
Enter Couchbase database connection info in the panel. Note that all fields must be surrounded by double quotes (i.e., “…”):
- The Bootstrap Nodes field is the value collected in the previous step preceded by couchbases://
- Provide credentials previously created in the Capella console for access.
- Set the Bucket name to world.
- Leave the default Document Type as JSON.
- Leave the field to use as ID as “ID”.
Click on the Run panel to build the job and execute it. You can see progress in the Job window.
Once complete, the job will have added 4079 documents to the world bucket:
You can access your Capella console directly or through the web control plane to verify they have loaded into the world bucket.
The Documents view shows what is now loaded:
Talend component setup for Couchbase input
Next, we will use Couchbase as an input and run a N1QL query. This will allow you to interact with the Couchbase Scopes and Collections.
Navigate to Job Designs > Standard and right-click on New Job – enter the name CouchbaseInput and click Finish.
From the palette, drag and drop the tCouchbaseInput widget to the left. Find the widget under Databases NoSQL/Couchbase.
From the palette, drag and drop the tLogRow widget (found under Logs and Errors) to the right.
Double-click on tCouchbaseInput in the panel; this will display the Component panel at the bottom. Enter the Couchbase database connection info. From the Query Type dropdown, select N1QL and enter the query as follows:
SELECT a.country FROM default:`travel-sample`.inventory.airline a
WHERE a.name = "Excel Airways";
Below is an example screenshot of how it should look:
Left-click on tLogRow on the panel; this will display the Component panel at the bottom.
Change the field separator from “|” to “” and select Print Header.
Connect the two components.
Run the job. If successful, it should return the value United Kingdom.
Continue Talend & Couchbase learning
Here are some further resources to help you dig deeper: