Couchbase upgrade is performed periodically by the admins. For us couchbase is a source and we poll couchbase using kafka connect by subscribing to the DCP channel.
During the upgrade process
- A new cluster with new IPs and hostnames are built
- XDCR is performed to replicate data from old to the new cluster
- DNS SRV is updated
Since our Kafka Connect process depends on Vbucket UUID and sequence number to begin to stream from. Does the XDCR replication maintain the documents in the same Vbucket UUIDs or can this be shuffled when a replication to a new cluster is performed?
If I understand you correctly, an “upgrade” process seems to me to be a replacement process in that you have a new cluster, and then the old cluster’s data is XDCR’ed to the new cluster. Then, the Kafka connector that used to subscribe to the old cluster is then to be used to subscribe to the new cluster. Please let me know if this is correct.
In each case, the VBUUID of a VBucket is generated randomly, and VBUUID information is not synchronized between source and target clusters.
When a source bucket is established, each VB will have a VBUUID at the time the bucket is created. The same happens for the target bucket. Thus, when you XDCR from the old cluster to the new cluster, even though the data on the new cluster is the same, the VBUUID of each VB holding the data will be different.
When you then reconnect the kafka to the “new” target bucket, it’ll experience a completely new set of VBUUIDs, and unrelated to the source.
Thanks for your response. Your understanding is correct. We will keep this in mind during our next upgrade to clear the kafka connect offsets , config, status topics. Basically will have to flush out the connector if VBUUID is changing