We see that a significant number of customers take advantage of Couchbase integration with Apache Kafka by using the Couchbase Kafka connector plugin to reliably stream data to and from Apache Kafka at scale.

Apache Kafka is an open-source platform for building real-time streaming data pipelines and applications. However, you need Apache Kafka infrastructure management expertise to architect, operate, and manage it on your own. Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed, highly available service that makes it easy for you to build and run applications that use Apache Kafka. 

Amazon MSK supports integration with Couchbase with the Amazon MSK Connect feature and Couchbase Kafka connector plugin. Using this feature, you can easily deploy the Couchbase connector and scale to adjust for changes in load.

In this blog post, we will go through Amazon MSK cluster setup and use Couchbase Kafka Connector as “sink” and “source”. 

Step 1: Couchbase Capella cluster

    • Get Started with Couchbase Capella free trial
    • Select your preferred AWS region and get started with the Couchbase Capella cluster in minutes
    • Configure Database Credentials
    • Setup private networking using VPC Peering or AWS PrivateLink for network connectivity with your AWS account. You can allow access from anywhere but it’s not recommended.  
    • Access the Data Tools section to create a new demo Bucket on the cluster

Step 2: Amazon MSK cluster and Amazon EC2 client

Using AWS CloudFormation

To get started easily we can use CloudFormation template for Streaming Data Solutions for MSK which deploys an Amazon MSK cluster and Amazon EC2 client.

    • Connect to the KafkaClient instance using the Session Manager option

    • Install Git and Apache Maven of the KafkaClient instance

    • Create sample sink and source topic on the MSK cluster for 

    • Open the file src/main/java/com/couchbase/connect/kafka/example/JsonProducerExample.java and update the Java source file at line 38 with the Kafka Cluster Broker connection point

Step 3: Configure MSK Connect plugin

    • Download the Couchbase Kafka Connect plugin ZIP
    • Upload the ZIP file to an S3 bucket to which you have access
    • Open the Amazon MSK console. In the left pane expand MSK Connect, then choose Custom plugins.
    • Choose Create custom plugin
    • Choose Browse S3. In the list of buckets find the bucket where you uploaded the ZIP file, and then in the list of objects select the ZIP file.
    • Enter couchbase-kafka-connect as custom plugin name, then choose Create custom plugin.

Step 4: Create MSK Connector for Sink

    • Using the custom plugin, we can now create a sink connector. Open the Amazon MSK console. In the left pane, under MSK Connect, choose Connectors. Choose Create connector.
    • Choose the custom plugin couchbase-kafka-connect and choose Next
    • Enter couchbase-sink-example as Connector name
    • Select the MSK Cluster created in Step 2

Step 5: Create MSK Connector for Source

    • Using the custom plugin, we can now create a sink connector. Open the Amazon MSK console. In the left pane, under MSK Connect, choose Connectors. Choose Create connector.
    • Choose the custom plugin couchbase-kafka-connect and choose Next.
    • Enter couchbase-source-example as Connector name
    • Select the MSK Cluster created in Step 2.

    • Get the Zookeeper connection string and Kafka Cluster Bootstrap connection string from the “View Client Information” page on your MSK cluster and use in the command below:

    • Update the source code line 38 with the MSK Cluster Bootstrap connection string.

    • Choose Save rules. Your MSK cluster will now accept all traffic from the client you created in the previous procedure.

Configure MSK Connect plugin 

Steps to create client to connect with MSK cluster

    • Create Kafka Client on Amazon EC2 instance 
    • This client will be used to send messages to a couchbase-sink topic on the MSK cluster and to monitor couchbase-source topic for messages streaming into the MSK cluster.
    • Check the cluster Status on the Cluster summary page. The status changes from Creating to Active as Amazon MSK provisions the cluster. When the status is Active, you can connect to the cluster. For more information about cluster status, see Cluster states.

Configure Couchbase Kafka Connect plugin

    • Download the Couchbase Kafka Connect plugin ZIP
    • Upload the ZIP file to an S3 bucket to which you have access
    • Open the Amazon MSK console. In the left pane expand MSK Connect, then choose Custom plugins.
    • Choose Create custom plugin
    • Choose Browse S3. In the list of buckets find the bucket where you uploaded the ZIP file, and then in the list of objects select the ZIP file.
    • Enter couchbase-kafka-connect as custom plugin name, then choose Create custom plugin.

Configure Couchbase MSK Connector for sink operation

    • Create IAM role for MSK connect based on the example policy.
    • Copy the following configuration and paste it into the connector configuration field.

Conclusion

This post illustrates how  you can use the Couchbase “source connector” for publishing document change notifications from Couchbase Capella to a Kafka topic, as well as a “sink connector” that subscribes to one or more Kafka topics and writes the messages to Couchbase Capella.

Author

Posted by Saurabh Shabhag, Partner Solutions Architect, AWS

Leave a reply