DCP durability (Kafka Connector)

Couchbase 7 Enterprise.

We are implementing a data pipeline with Couchbase as the source. Couchbase is our application database system.

We will use the couchbase kafka connector.

Our goal (among others) is to maintain a change history of our data.

The Kafka source connector produces the state of the database into the topic on the broker and then all subsequent data mutations. Once this change history is in kafka , then we feel pretty secure about its durability and replay-ability.

The danger zone is the source connector. If the source connector goes down (yes I understand that we run in a cluster and there are replicas but … stuff happens) then as far as we can tell the dcp changes that happen in between the connector going down and being brought up again are simply lost forever.

When the connector comes up again it will become aware of and start from current state of the data.

We are nervous about building out a business intelligence story over the history of our data and integrating it into our business only to have a outage of some sort two years down the line that puts an irretrievable hole in that history.

Ideally, Couchbase stores the DCP event source log somewhere and we can access it somehow to replay the the changes on top of some snapshot. (This would also solve a case a where both the kafka broker and also worker cluster goes down). Ideally Couchbase maintains a durability contract for the history of the data.

I am testing this out with a connector in stand alone mode and indeed when I crash the connector and bring it back up it simply restores from current state of the data even though the setting is lastSaveOrFromBeginning. (Someone from the confluence forums suggested that perhaps if I test a distributed deployment it will in fact work the way I’m look for it to, will see tomorrow)

(UPDATE: not yet ‘tomorrow’ so I haven’t retested anything in distrubuted mode but I just became aware of https://issues.couchbase.com/browse/KAFKAC-237 and accordingly may have misinterpreted my previous test results)

Is this possible now?

The Kafka connector guarantees it will eventually deliver the latest version of each document; it does not guarantee it will deliver a full history of every change. If your use case requires capturing every change to a document, you might want to adopt one of the strategies detailed by Jon in your related thread: Is there any bound to Mutation deduping - #2 by jon.strabala

Thanks,
David

1 Like