Extend Couchbase Analytics with RapidMiner using CData

This article will guide you through the steps needed to setup the connection from RapidMiner to Couchbase Analytics using the CData JDBC driver for Couchbase. More details regarding this driver can be found here.

Prerequisites

Couchbase

You will first need a Couchbase Server Enterprise Edition (EE) 6.x cluster with the Data and Analytic services enabled. I am using a single node local install of Couchbase Server EE but the information in this article applies to any Couchbase Server EE cluster.

If you do not have an existing Couchbase Server EE cluster, the following links will get you up and running quickly:

  1. Download Couchbase Server EE
  2. Install Couchbase Server EE
  3. Provision a single-node cluster (NOTE: use the default values for cluster configuration)

CData JDBC driver for Couchbase

Next you will need to download and install the CData JDBC driver for Couchbase.

Once downloaded and unpackaged you will want to setup the license:

Command Line Activation

The setup process should automatically install a license for your system. However, you may also install a license from the command line via cdata.jdbc.couchbase.jar. To do so execute the following command: java -jar cdata.jdbc.couchbase.jar -license. This process will create a cdata.jdbc.couchbase.lic that must reside next to the jar or in the .cdata directory under the user’s home directory.

Trial License Installation

The setup process should automatically install a trial license for your system. You may also use the method described in the “Command Line Activation” section above to install a trial license. Simply enter “TRIAL” as the product key when prompted.

Note** The cdata.jdbc.couchbase.lic must reside next to the jar or in the .cdata directory under the user’s home directory. i.e. “/Users/justinsimpson/.CData/cdata.jdbc.couchbase.lic”

Couchbase Setup

In Couchbase click on Settings 

Then Sample Buckets

The select the beer-sample checkbox and select Load Sample Data.  You can then navigate back to your Buckets and see beer-sample.

Once this is complete, we will need to setup Analytics.

Select Analytics, then create the shadow dataset of beers from the bucket of beer-sample.

Create Dataset in Couchbase Analytics

Create Dataset in Couchbase Analytics



Click Execute, this will crate the shadow dataset definition.

I want to repeat this step by creating a second shadow dataset with the following definition.

Next you will want to initialize it by activating the dataset with the following.

Click Execute.

You can now test this out within the Analytics dashboard by running something like the following.

More about Couchbase Analytics can be found here.

Your setup for Couchbase is complete!

Setup RapidMiner

To accomplish the simple task of using RapidMiner as an extension of Couchbase Analytics, there are 2 basic steps.

  1. Setup a connection
  2. Create a process that has an 2 operators to ‘Read Database’.  You might also want to store those results locally to combine it and use some other operators and process within RapidMiner.

Setup a Connection

Within RapidMiner, I start from a Blank Process. Under connections I select Create Connection and give it a conneciton name.  In this example I use ‘CBLocal’.

Setup RapidMiner Connection JDBC Connection

Setup RapidMiner Connection JDBC Connection

On the Setup tab, I make sure the Database system is set to “Custom (configure in Driver tab) and I select Configure URL Manually.

Setup RapidMiner Connection JDBC URL for Couchbase Analytics

Setup RapidMiner Connection JDBC URL for Couchbase Analytics

I populate the URL with the following:

All of the connection string options and details can be found under the CData JDBC connection string options.

Next, select the Driver tab to finish the setup.

In order to setup the JDBC driver Jar file, click the folder icon to browse to the location of the cdata.jdbc.couchase.jar.  Once this is selected, you can choose ‘cdata.jdbc.couchbase.CouchbaseDriver’ in the dropdown list.

Setup RapidMiner Connection JDBC Driver for Couchbase Analytics

Setup RapidMiner Connection JDBC Driver for Couchbase Analytics

You can now click Test connection to verify your setup is complete.

Using RapidMiner

Now that RapidMiner has a new connection configured, its time to load some data!

Start from a blank process.

  1. Drag and drop the operator ‘Read Database’ (its important to connect the output (out) to the results (res) in the Process window)
  2. Select the connection you just created
  3. Select Build SQL Query and enter the query you would like to pass to Couchbase Analytics
  4. Click the ‘Play’ Button to get the results!
Setup RapidMiner Read Database Operator from Couchbase Analytics

Setup RapidMiner Read Database Operator from Couchbase Analytics

My result set looks like this…

RapidMiner Results from Couchbase Analytics

RapidMiner Results from Couchbase Analytics

If you wanted to store those results and create multiple dataset to utilize other RapidMiner tools you would simply add an additional operator by dragging the ‘Store’ operator and setting up the location where you would like to store the data.

Note**  You need to make sure that the connection from the output (out) from the ‘Read Database’ operator to the input (inp) of Store operator is set properly.

Setup RapidMiner Store

Setup RapidMiner Store

I then repeated this process for the other shadow dataset we created ‘breweries’ as you can see above under the data section.

More about Rapid Miner Studio can be found here.

Next Steps

Download Couchbase, setup Analytics, and start using RapidMiner with your data and see what insights you can gleam.  Extend Analytics with other tools using the many Couchbase CData drivers that are at your fingertips.

Author

Posted by Justin Simpson, Solutions Engineer, Couchbase

Justin Simpson is a Solutions Engineer at Couchbase and has been working in IT and technology since 2004. He is based in the Cincinnati, Ohio area.

Leave a reply