A few questions about the `couchbase.stream.from` field in `kafka-connector`

Hi all :wave: ,

I am looking for additional information regarding the field couchbase.stream.from configuration field from the connector configuration Source Configuration Options | Couchbase Docs

Could you please clarify what does BEGINNING mean and how it operates?

In particular, I am looking for the following answers :

  • Does that mean that it will read and ingest the entire configured bucket to the kafka topic?
  • How does it control the volume? ie : What happens if the kafka topic can not host the entire bucket? Is this what couchbase.flow.control.buffer controls? How do we avoid impacting couchbase performance doing such loads?

Additionally, if possible, I would like to hear your thoughts about bulk load activities. In a few words, I need to ingest an entire bucket of data (100M+ records) from Couchbase to another datastore(elasticsearch) while also be able to process and ingest incoming realtime traffic.

Currently, we’re considering two separate workflows : one for the first load and another for real time (using Kafka connect). I am just curious is Kafka-connect could help to do this without the need of two separate processes given that couchbase.stream.from exists in the connector

Any help would be highly appreciated