Get all mutations on a document in DCP

zxcvmnb · November 26, 2018, 12:36pm

Hi,
Is it possible to get all mutations on a document in DCP stream.
It seems even when data is not compacted for a bucket, DCP returns only latest copy of document.
DCP SDK version 0.19

drigby · November 26, 2018, 1:49pm

No. DCP will only ever return the latest version of a document - primarily because this is all the Data Service stores.

zxcvmnb · November 27, 2018, 10:55am

Thanks @drigby.
One more related question.
Does DCP retain latest mutations for all keys indefinitely ?
For example, in first DCP run, we get 10 mutations, all create.
Then all docs are deleted.
I run next DCP run, starting previous endSeqNo after a month, do I get 10 deletes ?
If it retains the latest mutations indefinitely, does first DCP run of a billion keys data set, return millions of delete mutations for older docs ?

zxcvmnb · November 27, 2018, 1:04pm

Found https://developer.couchbase.com/documentation/server/3.x/admin/Concepts/concept-tombstone.html.
Metdata purge interval while creating bucket sets time when deletes are removed.

zxcvmnb · November 28, 2018, 8:26am

@drigby does DCP send any message or something to indicate it had purged records since last backup ?

drigby · November 28, 2018, 9:03am

Yes, it’s part of the negotiation when a DCP stream is established - see the protocol documentation at: https://github.com/couchbase/kv_engine/blob/master/docs/dcp/documentation/protocol-flow.md

(This is probably where I should highlight that AFIAK the Java DCP client is unsupported; and I don’t know how much of this it exposes to clients…)

zxcvmnb · November 28, 2018, 1:46pm

Since client missed some deletes, I am guessing that is considered a history branch and client gets a rollback to 0.
So this seems a case where client gets rollback without any failover etc, ie can happen on single node cluster too.
We are using https://github.com/couchbase/java-dcp-client.

ingenthr · November 28, 2018, 6:28pm

Yes, I believe that’s correct. If you wait “too long” you can be effectively guided into rematerialization. That kind of a rollback is effectively always required to be handled. With supported products that use the DCP client (Kafka Connector, Elastic Search connector) we then try to have options for how you want the connector to behave when this occurs. In some cases, it’s “sit on your hands and ask for help”.

I’m curious what you’re looking to do with this @zxcvmnb. Is it possible to describe what you’re aiming for a bit more?

zxcvmnb · December 3, 2018, 7:04am

We use java SDK and DCP to get mutations regularly. However some runs might fail, and next successful run might be after Metadata purge has happened.
If we get a rollback to some previous seqno/timestamp, then we handle it.

In this case since vbucket owner uuid will not change, so how does SDK figure out it has to send a rollback ?
Does it check my start seqno/timestamp against last purged seqno/timestamp ?

ingenthr · December 3, 2018, 11:19pm

Generally a DCP client will stash state somewhere (not in the cluster) and use that as a starting point. The SDK doesn’t determine a rollback is needed, rather the cluster does based on the request sequence number But, you do need to possibly manage to vbucket UUID changes, wherein a sequence may go back to an earlier point.

Of course, as mentioned, this is not officially supported, but it is Open Source so you have the power yourself.

zxcvmnb · April 29, 2019, 10:55am

Hi, found info in DCP doc indicating we might get multiple versions for a key. See https://github.com/couchbase/kv_engine/blob/master/docs/dcp/documentation/concepts.md.
In Deduplication section

However, when multiple disk snapshots are merged logically into a single DCP backfill snapshot deduplication is not done.

It seems to contradict the said answer. Am I misunderstanding something here ?
@drigby ?

Topic		Replies	Views
How to get the collection and scope details of a mutation from DCP CLIENT Java SDK java , dcp	28	1917	October 21, 2022
DCP stops getting data Java SDK dcp	3	2650	May 28, 2019
Way to get failover/remaining mutation events streams from client if dcp is offline for sometime during mutations Java SDK dcp	3	1531	May 14, 2019
Couchbase DCP skip intermediate states Couchbase Server dcp	3	53	May 12, 2025
Will DCP aggregate several changes to one document? Couchbase Server	8	1717	August 12, 2016

Get all mutations on a document in DCP

Related topics