I have structured streaming job that streams from couchbase, persistence polling interval is100ms. Observed a weird case where there was huge load in a single batch and spark job went to second attempt, and it missed processing few records in that batch. How can this happen. I am maintaining a checkpoint folder in hdfs.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Unable to Count the documents using spark structured streaming | 0 | 1537 | April 14, 2020 | |
Spark Streaming from bucket, stream is empty | 8 | 4074 | July 21, 2016 | |
When i am connecting to couchbase and doing a streaming job i am getting the below error | 0 | 1277 | April 25, 2019 | |
spark.sparkContext.couchbaseQuery number of partitions | 8 | 4674 | September 30, 2020 | |
Couchbase is fetching all data when using Stream From as from Beginning in structured streaming is there any way we can filter ids or provide query instead of loading all data | 2 | 436 | June 25, 2023 |