DCP stops getting data

We use https://github.com/couchbase/java-dcp-client to periodically get data.
Docs are continuously being updated while mutations are sent by DCP.
We initially get sequence number of all vBuckets, and start DCP streaming till that sequence no.
When StreamEnd message is received we consider a vBucket as done.

However sometimes DCP just stops sending any mutations, job fails after waiting for a minute or two.
From logs, say I asked for seq no upto 100, during failure snapshot start seqno was something like 110, and snapshot end seq no was like 120.
So we already got more data than we requested, but DCP for some reason does not send a STREAM_END.
Why does this happen ?
From https://github.com/couchbase/kv_engine/blob/master/docs/dcp/documentation/concepts.md, DCP by default streams from vbucket master, so we should not have problem discussed in the last section " Streaming from a replica vbucket".
Rollback mitigation is not enabled.
However if it were enabled, would DCP stream hang if replica went down ?

Also, by what time is data guaranteed to be persisted ?
Say DCP stream start about 10 secs after getting sequence numbers.
Are those mutations persisted to disk when stream starts ?
Is it okay if I close DCP stream once memory only snapshots start ?

If flow control is enabled (and it really should be, otherwise you risk running out of heap space), the server might be waiting for you to acknowledge that you processed the events before it sends more. See the FlowControl sample code for a demo.

It’s normal for the server to continue sending events until the client has received the complete snapshot that contains the requested end sequence number, but it sounds like you received data from the next snapshot as well. Is that correct? If so, that’s interesting, and is something we should look into.

The AirportsInFrance sample code periodically calls client.sessionState().isAtEnd() to see if the server has sent all of the requested data. You could try that approach instead of waiting for STREAM_END.

No, the stream would not hang. The client would just have one less replica to poll.

There is no guarantee the data is ever persisted unless persistence polling (also known as “rollback mitigation”) is enabled. If persistence polling is enabled, you will only receive changes that have been persisted to the active vbucket and all available replicas.

Jira issue MB-31832 is an enhancement request for adding persistence guarantees to the DCP protocol, but this is unlikely to happen in the near future. For now, persistence polling is the best option if you care about persistence.

  1. We have flow control enabled, and bufferAckWatermark is set to 20. We also acknowledge an event as shown in FlowControl.java example.
  2. Did you get chance to check why we got more snapshots than we requested ? We will call client.sessionState().isAtEnd() from now, but issue is hit rarely so we dont know if it is fixed.
    Say that call too returns false, can we safely exit with whatever we have ? DCP should be able to pickup from any point we left even if it was in middle of some snapshot, right ?
  3. Okay, got this one.
  4. Got this one too.

Cool! Just make sure to acknowledge the event (and release the buffer, usually at the same time) on every code path, otherwise small leaks can add up and consume the entire buffer.

Did you get chance to check why we got more snapshots than we requested ?

Unfortunately I have not found time to investigate. Support via the public forum is provided on a “best effort” basis. If the suggested workaround is not sufficient, please file a Jira issue at https://issues.couchbase.com (to request an account, follow the “contact administrators” link in the login form). If you are an enterprise customer, please also reach out to the support team. I should probably also temper expectations by mentioning that the Java DCP client is not an officially supported product – but we still want to help you, of course.

DCP should be able to pickup from any point we left even if it was in middle of some snapshot, right?

Yes, that’s correct.


1 Like