Source Connector stream_from setting, how far back is the history?

When couchbase.stream_from is set to BEGINNING or SAVED_OFFSET_OR_BEGINNING and the Connector worker is started for the first time (i.e., there is no saved offset yet), exactly how far back in history should we expect to see data streaming from? Are there any controls for this? Do events ever become stale/purged?

Hi Atkinsr,

Couchbase Server periodically compacts older parts of the event history. The latest versions of a document is always present in the history, though.

exactly how far back in history should we expect to see data streaming from?

You’ll get everything, except for events that have been deduplicated due to compaction. That is, you’re guaranteed to get at least the latest version of the document, and you might get earlier versions of the documents as well.

That is, if the “real” history is A1 A2 B1, you might get all of the events, or you might just get A2 B1 depending on whether the server has compacted that portion of history.


Thank you. What about expired documents? We have lots of new documents added daily, but we also have a 2-day expiration on these documents (for this particular use case). Am I going to be getting events for documents that were added a week ago, but are no longer in the bucket due to expiration?

Expiration is the same as deletion, as far as the connector is concerned.

It comes down to compaction again. If the document was both created and deleted within a time period that has been compacted, I don’t think you’ll get any events for the document. (Disclaimer: this is just an educated guess.)