Spring Data Couchbase application stuck indefinitely after upserting a documeny

The document is inserted in Couchbase and the main thread gets stuck after that.

Spring Boot Version: 2.3.1.RELEASE
Spring Data Couchbase: 4.3.5
reactor-core: 3.3.6.RELEASE
Couchbase Java Client: 3.2.7

“main” #1 prio=5 os_prio=0 tid=0x0000000002da4800 nid=0x19ec waiting on condition [0x0000000002d9b000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000704bf6fa0> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87)
at reactor.core.publisher.Mono.block(Mono.java:1710)
at org.springframework.data.couchbase.core.ExecutableUpsertByIdOperationSupport$ExecutableUpsertByIdSupport.one(ExecutableUpsertByIdOperationSupport.java:75)
at org.springframework.data.couchbase.repository.support.SimpleCouchbaseRepository.save(SimpleCouchbaseRepository.java:87)

Hi @neha.sahni. As a generic Project Reactor tip, I’ve seen issues before when I’ve accidentally used Reactor incorrectly and been doing blocking operations inside reactive operators. Project Reactor (and any of these fibre-esque runtimes) work by chopping the work into small chunks (operators in reactor’s case) and then running them on generally a very small pool of real Threads. The chunks have to execute extremely quickly - if they’re blocking, then they can easily end up saturating the underlying Threads and leading to no work being able to get done.

We don’t know for sure if it’s what’s happening here, as there’s not much information to go on. But it would be my starting guess.

Reactor provides a tool, GitHub - reactor/BlockHound: Java agent to detect blocking calls from non-blocking threads., that I’ve found very useful for detecting such issues in my code.

Thank you for the tip!
I tried using this BlockHound tool but it was detecting all slow calls which were not even a problem in my main thread. Also, the thread dump points exactly where the issue is but since its internal working of Spring Data Couchbase classes , can’t understand why the CountDownLatch is not countDown till 0.

“main” #1 prio=5 os_prio=0 tid=0x0000000003614800 nid=0x5f14 waiting on condition [0x000000000360b000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000709cf3808> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87)
at reactor.core.publisher.Mono.block(Mono.java:1679)
at org.springframework.data.couchbase.core.ExecutableUpsertByIdOperationSupport$ExecutableUpsertByIdSupport.one(ExecutableUpsertByIdOperationSupport.java:75)

The stacktrace unfortunately does not indicate exactly what the problem is. It’s deep in the depths of reactor and we can see that a Mono is waiting for its result - but there could be many reasons why that result isn’t able to be delivered. Based on several years experience working with reactor I would say that the #1 cause of these kinds of lockups is what I’ve identified above - an application error blocking the underlying thread pool. If Blockhound is identifiying issues then that goes a long way to supporting that theory, so I’d strongly recommend taking a look at resolving those issues as the next step.

A couple of other suggestions:

  • You could post a small code snippet of how you’re using the API, or a very small project, and we could take a quick look.
  • You could update the software, those are some rather old versions.
  • Do you need reactor? The blocking APIs might be easier to use.
public String createCatalogLock() {
    final CatalogLock lock = new CatalogLock();

    logger.info("[COUCHBASE] Enabling catalog creation lock");

    final CatalogLock entity = lockRepository.save(lock); // The thread gets hung here
    return entity.getId(); // The code is not reaching at this step.
}

lockRepository is a CouchbaseRepository

public interface CatalogLockRepository extends CouchbaseRepository<CatalogLock, String> {
}

I tried updating spring-data-couchbase and Java SDK Clients to various combinations. Now I will plan to upgrade Spring Boot to 2.6.6

I am not using Reactor directly in my code. Spring Data Couchbase is internally using it (org.springframework.data.couchbase.core.ExecutableUpsertByIdOperationSupport)

The blockHound also points to the same stacktrace as in Thread Dump
Caused by: reactor.blockhound.BlockingOperationError: Blocking call! sun.misc.Unsafe#park
at sun.misc.Unsafe.park(Unsafe.java)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87)
at reactor.core.publisher.Mono.block(Mono.java:1703)
at org.springframework.data.couchbase.core.ExecutableUpsertByIdOperationSupport$ExecutableUpsertByIdSupport.one(ExecutableUpsertByIdOperationSupport.java:75)
at org.springframework.data.couchbase.repository.support.SimpleCouchbaseRepository.save(SimpleCouchbaseRepository.java:87)

Ah, this does change things. I’m not sure then - perhaps one of our Spring Data gurus will be able to help.

The blockHound also points to the same stacktrace as in Thread Dump

Well BlockHound is identifying that there is a blocking call inside lockRepository.save(lock). It’s correct on this. But this should only be a problem if your createCatalogLock() call is ultimately inside a reactive operator. Since you’re not using reactor yourself, that shouldn’t be the case - can you confirm? So I think this is a false positive.

yes I confirm. createCatalogLock() call is not inside a reactive operator.

Is there any specific forum for spring data couchbase?

No this is the correct place.

Can someone please direct the question to Spring Data experts?

I upgraded my application to Spring Boot 2.6.6 and that resolved the problem. Thank you for all the tips and prompt responses! You can mark this as resolved

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.