Hi @mreiche
From what I understand, it is a load balancer that resolves to any couchbase node that is healthy. Would that be fine?
In terms of the data loss, we have two separate Connectors listening to the same Couchbase cluster updates and publishing to two separate Kafka topics.
Configuration examples
(some properties were omitted for brevity)
{
"connector.class": "com.couchbase.connect.kafka.CouchbaseSourceConnector",
"producer.override.compression.type": "gzip",
"couchbase.bootstrap.timeout": "600s",
"couchbase.compression": "ENABLED",
"tasks.max": "6",
"couchbase.log.document.lifecycle": "true",
"couchbase.source.handler": "com.custom.handler",
"couchbase.bucket": "bucket",
"couchbase.stream.from": "SAVED_OFFSET_OR_BEGINNING",
"couchbase.log.redaction": "FULL",
"value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"couchbase.xattrs": "true",
"couchbase.topic": "topic1"
}
Connector 2 :
{
"connector.class": "com.couchbase.connect.kafka.CouchbaseSourceConnector",
"couchbase.bootstrap.timeout": "900s",
"tasks.max": "10",
"couchbase.compression": "ENABLED",
"transforms": "deserializeJson",
"couchbase.log.document.lifecycle": "true",
"couchbase.source.handler": "com.custom.handler",
"couchbase.bucket": "bucket",
"couchbase.username": "user",
"couchbase.stream.from": "SAVED_OFFSET_OR_NOW",
"value.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"transforms.deserializeJson.type": "com.couchbase.connect.kafka.transform.DeserializeJson",
"couchbase.topic": "topic2"
}
We noticed that the two topics were not in sync.
When we looked at the logs, we noticed that the connector 2
did not publish many updates (3M+ in fact)
Both connectors are using the same Kafka Connect deployment and underlying Kafka infrastructure
The only log that seems suspicious is this but I do not think that this is enough justification to have data loss
level=WARN connector_context=[cb-connect|task-0] [Producer clientId=connector-producer-cb-connect] Received invalid metadata error in produce request on partition topic due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.. Going to request metadata update now class=org.apache.kafka.clients.producer.internals.Sender line=650
I have couchbase.log.document.lifecycle
enabled but I do not see anything in the logs to suggest that the records were persisted to Kafka from the connector 2. The first connector did persist them