Couchbase kafka connector connection error: Cannot fetch configuration for bucket

My project needs to capture changes to an old version of CouchBase (v4.5). To address compatibility issues, an old version of the kafka couchbase connector (v3.1.2) is registered to kafka connect that is launched using strimzi kafka operator (v0.32). All related settings are correct, including connection cluster address, username, password and bucket. The connector is able to connect to the couchbase server. However, the connector fails to start due to the following error. Does anyone happen to know this issue and how to fix it?

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.couchbase.client.deps.io.netty.util.internal.PlatformDependent0$1 (file:/opt/kafka/plugins/couchbase-connector/28a854e7/kafka-connect-couchbase-3.1.2/share/java/kafka-connect-couchbase/core-io-1.4.2.jar) to field java.nio.Buffer.address
WARNING: Please consider reporting this to the maintainers of com.couchbase.client.deps.io.netty.util.internal.PlatformDependent0$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2022-12-02 04:14:07,256 ERROR [my-couchbase-connector|worker] WorkerConnector{id=my-couchbase-connector} Error while starting connector (org.apache.kafka.connect.runtime.WorkerConnector) [connector-thread-my-couchbase-connector]
org.apache.kafka.connect.errors.ConnectException: Cannot fetch configuration for bucket my_bucket
    at com.couchbase.connect.kafka.CouchbaseSourceConnector.start(CouchbaseSourceConnector.java:55)
    at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:193)
    at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:218)
    at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:363)
    at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:346)
    at org.apache.kafka.connect.runtime.WorkerConnector.doRun(WorkerConnector.java:146)
    at org.apache.kafka.connect.runtime.WorkerConnector.run(WorkerConnector.java:123)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)

It seems that problem is caused by the low version of the core-io package (core-io-1.4.2.jar) that is part of the [couchbase connector](

http://packages.couchbase.com/clients/kafka/3.1.2/kafka-connect-couchbase-3.1.2.zip

) (v3.1.2) used. However, it’s not clear if this is the root cause of the problem. Simply creating a new zip file by replacing the core-io package with a newer version may cause other compatibility issues.

Hi @klyh ,

I exhumed the documentation for version 3.1 of the connector, and it looks Couchbase 4.5 wants you to omit the connector’s connection.username config property (or if you set it, set it to the bucket name). For connection.password, it wants the bucket password.

This version of the connector was never tested against anything higher than Java 8. You could try downgrading to that version.

Mixing and matching core-io versions is not recommended.

Beyond that, I’m afraid I don’t have any advice. These versions are all well past EOL.

EDIT: Unless maybe there’s more to that stack trace? If you want to share more of the logs, I can take another quick look.

Thanks,
David

1 Like

Hi @klyh ,

Mike Reiche did some detective work and found the code that logs this error.

Please double check you’re setting the connection.cluster_address connector config property. If that doesn’t solve the problem, we would expect to see a WARN message with a stack trace earlier in the logs, with a message that contains the string Ignoring error for node. If you want to share that part of the log, we’ll take a quick look.

Thanks,
David

Thanks @david.nault for your response.

Following your suggestion, I removed the connection.username property and double checked all the other properties in the json config I uploaded/registered to Kafka Connect. The connection.cluster_address is just the IP address of the CB server, which is correctly printed in the log messages.

I don’t see any WARN messages at all.

The WARNING messages in my original description only appeared when I first registered the same connector to a freshly started Kafka Connect. There were no such or other WARN messages whatsoever if I unregistered the connector and then registered it again.

Following is my connector config in JSON:

{
  "name": "um-ci-couchbase-connector",
  "config": {
    "name": "my-couchbase-connector",
    "connector.class": "com.couchbase.connect.kafka.CouchbaseSourceConnector",
    "tasks.max": "3",
    "connection.cluster_address": "<IPv4 Address>",
    "connection.timeout.ms": "100000",
    "connection.username": "Administrator",
    "connection.password": "<AdminPassword>",
    "connection.bucket": "my-bucket",
    "topic.name": "my-kafka-topic",
    "use_snapshots": "true"
  }
}

Thank you for sharing the config.

With Couchbase 4.5 and this old version of the Kafka connector:

  • connection.password must be the bucket password (or empty string if the bucket doesn’t have a password), not the administrator password.
  • connection.username should not be specified.
  • use_snapshots should be set to “false”, otherwise when the connector starts up it tries to buffer the entire contents of the bucket in memory before publishing anything to Kafka – not good, right?

Thanks @david.nault. Now the problem has been solved with the right bucket password.

A follow-up question: Since I set use_snapshots to false, what get published to Kafka are the recent updates. Then how can I get the snapshot published if that’s important for my application?

Thanks for sharing the good news about the bucket password.

Unless my memory is wildly incorrect, snapshots themselves don’t get published, regardless of the “use_snapshots” setting. All this setting does is control whether the connector waits for a complete DCP snapshot to be received from Couchbase before any records in that snapshot are published to Kafka.

Setting use_snapshots=true never made much sense. DCP snapshots typically aren’t a useful concept unless you can atomically apply a consistent set of changes; that’s not possible with Kafka Connect, where you can’t atomically published a set of records.

Changing the value of this config property does not affect which records get published to the stream; your application should not notice any difference.

Thanks,
David

@david.nault Thanks for your explanation. I did notice that the use_snapshots option had been removed in more recent version of the CB Connector.

Does the CB Connector automatically get a snapshot of the specified CB bucket once connected, publish the snapshot records to Kafka, and then capture and publish changes to the bucket?

If not, how can we get the snapshot if my application needs that?

If yes, how much runtime overhead does the initial snapshot add to the CB server?

Either way, how much overhead does the change capturing add?

I think we’re using the word “snapshot” to mean different things.

What does “snapshot” mean to you?

Thanks for pointing it out @david.nault .

To me, snapshot is to read all docs in the specified bucket and publish those docs to Kafka right after the CB connector is started. Does the CB connector do something like that?

This old version of the Couchbase Kafka connector does not have an option to control where streaming starts. It can only resume from the most recently saved stream position. If there is no saved stream position, it starts from the “beginning” and streams all documents in the bucket.

If you want the stream to include everything in the bucket (and not just documents that changed since the most recently saved stream position), you will need to invalidate the saved stream position.

The easiest way to do that is to rename the connector.

Try stopping and removing the connector, the recreating it with a different name. The connector should then publish all documents in the bucket.

Thanks,
David

Thanks @david.nault for the clarification.

Do we have some documentation on how much runtime overhead the initial snapshot and following streaming add to the CB server?

I don’t know. You could try asking in the "Couchbase Server` forum category, or engaging with your Couchbase support team.

It probably depends on your workload. I would recommend measuring the impact yourself.

Thanks,
David