Kafka Connector 3 Developer Preview 1

I’m glad to announce the first developer preview of the next major iteration of our integration with Kafka. This version is based on a new library for DCP, and supports the Kafka Connect framework. In this post I will show how it could be integrated to relay data from Couchbase to HDFS.

Here I’ll show steps for CentOS/Fedora Linux distributions. The steps on other OSs are going to be similar. First, install Confluent Platform (https://docs.confluent.io/3.0.0/installation.html#rpm-packages-via-yum) and download the Couchbase zip archive with connector integration https://packages.couchbase.com/clients/kafka/3.0.0-DP1/kafka-connect-couchbase-3.0.0-DP1.zip

To register the connector, just extract the contents to the default class path, for example on CentOS (Fedora) it is /usr/share/java:

unzip kafka-connect-couchbase-3.0.0-DP1.zip
sudo cp -a kafka-connect-couchbase-3.0.0-DP1/share /usr/

1 2	unzip kafka–connect–couchbase–3.0.0–DP1.zip sudo cp –a kafka–connect–couchbase–3.0.0–DP1/share /usr/

Now run the Confluent Control Center and all dependent services. Read more about what these commands do at Confluent’s quickstart guide

sudo zookeeper-server-start /etc/kafka/zookeeper.properties
sudo kafka-server-start /etc/kafka/server.properties
sudo schema-registry-start /etc/schema-registry/schema-registry.properties
sudo connect-distributed /etc/kafka/connect-distributed.properties
sudo control-center-start /etc/confluent-control-center/control-center.properties

sudo zookeeper–server–start /etc/kafka/zookeeper.properties

sudo kafka–server–start /etc/kafka/server.properties

sudo schema–registry–start /etc/schema–registry/schema–registry.properties

sudo connect–distributed /etc/kafka/connect–distributed.properties

sudo control–center–start /etc/confluent–control–center/control–center.properties

At this point everything is ready for setting up the link to transfer documents from Couchbase to HDFS using Kafka Connect. We assume you are running Couchbase Server on https://127.0.0.1:8091/ and Confluent Control Center on https://127.0.0.1:9021/. For this example, make sure you have the travel-sample bucket loaded on Couchbase. If you didn’t set it up when setting up the cluster, you can add it through the settings part of the Web UI.

Once you have all of theese prerequisites out of the way, navigate to the section “Kafka Connect” in your Confluent Control Center. Select “New source”, then select “CouchbaseSourceConnector” as a connector class and fill in the settings so that the final JSON will be similar to:

{
  “connector.class”: “com.couchbase.connect.kafka.CouchbaseSourceConnector”,
  “name”: “travel-source”,
  “connection.bucket”: “travel-sample”,
  “connection.cluster_address”: “127.0.0.1”,
  “topic.name”: “travel-topic”
}

{

“connector.class”: “com.couchbase.connect.kafka.CouchbaseSourceConnector”,

“name”: “travel-source”,

“connection.bucket”: “travel-sample”,

“connection.cluster_address”: “127.0.0.1”,

“topic.name”: “travel-topic”

}

Once you save the Source connection, the Connect daemon will start receiving mutations and storing them into specified Kafka topic. To demonstrate a full pipeline, lets setup a Sink connection to get data out of Kafka. To do so, go to “Sinks” tab, and click “New sink” button. It should ask for a topics where interesting data stored, enter “travel-topic”. Then select “HdfsSinkConnector” and fill in settings so that, the JSON config will look like this (assuming the HDFS name node is listening on hdfs://127.0.0.1:8020/):

{
  “connector.class”: “io.confluent.connect.hdfs.HdfsSinkConnector”,
  “name”: “hdfs-travel-sink”,
  “flush.size”: “10”,
  “partitioner.class”: “io.confluent.connect.hdfs.partitioner.FieldPartitioner”,
  “partition.field.name”: “partition”,
  “hdfs.url”: “hdfs://127.0.0.1:8020”,
  “topics”: “travel-topic”
}

{

“connector.class”: “io.confluent.connect.hdfs.HdfsSinkConnector”,

“name”: “hdfs-travel-sink”,

“flush.size”: “10”,

“partitioner.class”: “io.confluent.connect.hdfs.partitioner.FieldPartitioner”,

“partition.field.name”: “partition”,

“hdfs.url”: “hdfs://127.0.0.1:8020”,

“topics”: “travel-topic”

}

Once the Sink connection configured, you will see the data appearing on HDFS in /topics/travel-topic/ with the default topics directory. Let’s inspect one of them:

$ hdfs dfs -fs hdfs://localhost:8020 -cat /topics/travel-topic/partition=89/travel-topic+0+0000000101+0000000101.avro | avropipe
/   []
/0  {}
/0/partition    89
/0/key  “route_28879”
/0/expiration   0
/0/flags    33554438
/0/cas  1471633063247347712
/0/lockTime 0
/0/bySeqno  1
/0/revSeqno 1
/0/content  “{“id”:28879,”type”:”route”,”airline”:”G4″,”airlineid”:”airline_35″,”sourceairport”:”AZA”,”destinationairport”:”FWA”,”stops”:0,”equipment”:”319″,”schedule”:[{“day”:0,”utc”:”01:59:00″,”flight”:”G4097″},{“day”:1,”utc”:”09:30:00″,”flight”:”G4697″},{“day”:1,”utc”:”09:50:00″,”flight”:”G4879″},{“day”:1,”utc”:”07:44:00″,”flight”:”G4310″},{“day”:1,”utc”:”01:23:00″,”flight”:”G4226″},{“day”:2,”utc”:”19:58:00″,”flight”:”G4921″},{“day”:2,”utc”:”09:49:00″,”flight”:”G4376″},{“day”:2,”utc”:”17:57:00″,”flight”:”G4446″},{“day”:2,”utc”:”21:06:00″,”flight”:”G4032″},{“day”:3,”utc”:”17:05:00″,”flight”:”G4198″},{“day”:3,”utc”:”12:21:00″,”flight”:”G4098″},{“day”:3,”utc”:”19:31:00″,”flight”:”G4571″},{“day”:4,”utc”:”05:27:00″,”flight”:”G4001″},{“day”:4,”utc”:”07:03:00″,”flight”:”G4023″},{“day”:4,”utc”:”16:50:00″,”flight”:”G4631″},{“day”:5,”utc”:”18:13:00″,”flight”:”G4757″},{“day”:6,”utc”:”20:35:00″,”flight”:”G4157″},{“day”:6,”utc”:”21:52:00″,”flight”:”G4582″},{“day”:6,”utc”:”00:55:00″,”flight”:”G4348″},{“day”:6,”utc”:”06:01:00″,”flight”:”G4731″}],”distance”:2483.859992489083}”

$ hdfs dfs –fs hdfs://localhost:8020 -cat /topics/travel-topic/partition=89/travel-topic+0+0000000101+0000000101.avro | avropipe

/ []

/0 {}

/0/partition 89

/0/key “route_28879”

/0/expiration 0

/0/flags 33554438

/0/cas 1471633063247347712

/0/lockTime 0

/0/bySeqno 1

/0/revSeqno 1

/0/content “{“id“:28879,”type“:”route“,”airline“:”G4“,”airlineid“:”airline_35“,”sourceairport“:”AZA“,”destinationairport“:”FWA“,”stops“:0,”equipment“:”319“,”schedule“:[{“day“:0,”utc“:”01:59:00“,”flight“:”G4097“},{“day“:1,”utc“:”09:30:00“,”flight“:”G4697“},{“day“:1,”utc“:”09:50:00“,”flight“:”G4879“},{“day“:1,”utc“:”07:44:00“,”flight“:”G4310“},{“day“:1,”utc“:”01:23:00“,”flight“:”G4226“},{“day“:2,”utc“:”19:58:00“,”flight“:”G4921“},{“day“:2,”utc“:”09:49:00“,”flight“:”G4376“},{“day“:2,”utc“:”17:57:00“,”flight“:”G4446“},{“day“:2,”utc“:”21:06:00“,”flight“:”G4032“},{“day“:3,”utc“:”17:05:00“,”flight“:”G4198“},{“day“:3,”utc“:”12:21:00“,”flight“:”G4098“},{“day“:3,”utc“:”19:31:00“,”flight“:”G4571“},{“day“:4,”utc“:”05:27:00“,”flight“:”G4001“},{“day“:4,”utc“:”07:03:00“,”flight“:”G4023“},{“day“:4,”utc“:”16:50:00“,”flight“:”G4631“},{“day“:5,”utc“:”18:13:00“,”flight“:”G4757“},{“day“:6,”utc“:”20:35:00“,”flight“:”G4157“},{“day“:6,”utc“:”21:52:00“,”flight“:”G4582“},{“day“:6,”utc“:”00:55:00“,”flight“:”G4348“},{“day“:6,”utc“:”06:01:00“,”flight“:”G4731“}],”distance“:2483.859992489083}”

That’s my quick runthrough example! The DCP client is still under active development and has some additional features being added to handle various topology change, failure scenarios. The next couple updates of our Kafka connector will pick up those updates. I should also briefly note that Couchbase’s DCP client interface should be considered volatile for the moment. We use it in various projects, but you should only use it directly at your own risk.

The source code for the connector is at https://github.com/couchbaselabs/kafka-connect-couchbase. The issue tracker is at https://issues.couchbase.com/projects/KAFKAC, and feel free to ask any questions on https://www.couchbase.com/forums/.

Platform

Services

Self-Managed

Capabilities

By Use Case

By Industry

Popular Docs

Quickstart

Resource Center

About

Partnerships

Kafka Connector 3 Developer Preview 1

Azure Key Vault for Credentials

Your AI Agents Are Stuck in Pilot. It’s a Data Problem, Not a Model Problem.

When the Internet Goes Down, Your Business Shouldn’t

Distributed Databases: An Overview

On-Device AI: Benefits, Use Cases, and Challenges

Ready to get Started with Couchbase Capella?

Start building

Use Capella free

Get in touch

Platform

Services

Self-Managed

Capabilities

By Use Case

By Industry

Popular Docs

Quickstart

Resource Center

About

Partnerships

Kafka Connector 3 Developer Preview 1

Get Couchbase blog updates in your inbox

Author

Posted by Sergey Avseyev, SDK Engineer, Couchbase

Leave a comment Cancel reply

Ready to get Started with Couchbase Capella?

Start building

Use Capella free

Get in touch