DCP stops getting data

zxcvmnb · April 29, 2019, 12:11pm

We use https://github.com/couchbase/java-dcp-client to periodically get data.
Docs are continuously being updated while mutations are sent by DCP.
We initially get sequence number of all vBuckets, and start DCP streaming till that sequence no.
When StreamEnd message is received we consider a vBucket as done.

However sometimes DCP just stops sending any mutations, job fails after waiting for a minute or two.
From logs, say I asked for seq no upto 100, during failure snapshot start seqno was something like 110, and snapshot end seq no was like 120.
So we already got more data than we requested, but DCP for some reason does not send a STREAM_END.
Why does this happen ?
From https://github.com/couchbase/kv_engine/blob/master/docs/dcp/documentation/concepts.md, DCP by default streams from vbucket master, so we should not have problem discussed in the last section " Streaming from a replica vbucket".
Rollback mitigation is not enabled.
However if it were enabled, would DCP stream hang if replica went down ?

Also, by what time is data guaranteed to be persisted ?
Say DCP stream start about 10 secs after getting sequence numbers.
Are those mutations persisted to disk when stream starts ?
Is it okay if I close DCP stream once memory only snapshots start ?

david.nault · April 29, 2019, 11:04pm

If flow control is enabled (and it really should be, otherwise you risk running out of heap space), the server might be waiting for you to acknowledge that you processed the events before it sends more. See the FlowControl sample code for a demo.

It’s normal for the server to continue sending events until the client has received the complete snapshot that contains the requested end sequence number, but it sounds like you received data from the next snapshot as well. Is that correct? If so, that’s interesting, and is something we should look into.

The AirportsInFrance sample code periodically calls client.sessionState().isAtEnd() to see if the server has sent all of the requested data. You could try that approach instead of waiting for STREAM_END.

No, the stream would not hang. The client would just have one less replica to poll.

There is no guarantee the data is ever persisted unless persistence polling (also known as “rollback mitigation”) is enabled. If persistence polling is enabled, you will only receive changes that have been persisted to the active vbucket and all available replicas.

Jira issue MB-31832 is an enhancement request for adding persistence guarantees to the DCP protocol, but this is unlikely to happen in the near future. For now, persistence polling is the best option if you care about persistence.

zxcvmnb · May 22, 2019, 8:06am

We have flow control enabled, and bufferAckWatermark is set to 20. We also acknowledge an event as shown in FlowControl.java example.
Did you get chance to check why we got more snapshots than we requested ? We will call client.sessionState().isAtEnd() from now, but issue is hit rarely so we dont know if it is fixed.
Say that call too returns false, can we safely exit with whatever we have ? DCP should be able to pickup from any point we left even if it was in middle of some snapshot, right ?
Okay, got this one.
Got this one too.
@david.nault

david.nault · May 28, 2019, 9:32pm

Cool! Just make sure to acknowledge the event (and release the buffer, usually at the same time) on every code path, otherwise small leaks can add up and consume the entire buffer.

Did you get chance to check why we got more snapshots than we requested ?

Unfortunately I have not found time to investigate. Support via the public forum is provided on a “best effort” basis. If the suggested workaround is not sufficient, please file a Jira issue at https://issues.couchbase.com (to request an account, follow the “contact administrators” link in the login form). If you are an enterprise customer, please also reach out to the support team. I should probably also temper expectations by mentioning that the Java DCP client is not an officially supported product – but we still want to help you, of course.

DCP should be able to pickup from any point we left even if it was in middle of some snapshot, right?

Yes, that’s correct.

Thanks,
David

Topic		Replies	Views
Couchbase DCP client does not terminate stream Java SDK dcp	1	1716	August 15, 2017
Way to get failover/remaining mutation events streams from client if dcp is offline for sometime during mutations Java SDK dcp	3	1519	May 14, 2019
Get all mutations on a document in DCP Java SDK dcp	10	3742	April 29, 2019
Requesting for an api to get the current / most recent streamOffset of all vBuckets Java SDK connections , java , dcp	3	889	October 20, 2022
DCP Stats VBucket-Seqno in Java SDK? Java SDK	3	2303	July 7, 2015

DCP stops getting data

Related topics