SDK utilizes one IO thread only from the pool

I’ve started my project with Spring Data Couchbase and its repositories first. Then I noticed that after some amount of load the couple of service and Couchbase cluster hits glass ceiling, sort of. Load comes from 1 service to 6 servers in Couchbase cluster at approximately 12.5K rps. I’ve fired java profiler and here’s what it shows from service prospective:

as result here’s how server’s CPU load looks like:

Basically IO thread burns its own CPU core. Clearly the bottleneck.

It’s quite obvious that Spring Data Couchbase use sync operations so I ditched it and turned code into regular Bucket.async().get(..., RawJsonDocument.class).map(...) with, well, toBlocking().single() before it goes out of web service method. Surprisingly, nothing changed in CPU and CB requests pattern. Still one core is at ~100% and up to ~12.5K requests on CB.

By the nature of the project I don’t have much of requests to CB per service call. It’s usually 1-1 or 2-1. Is there any way for SDK to utilize more than one IO thread in such scenario?

That’s weird, can you describe what kind of load pattern do you have? I’m asking because normally the IO threads are multiplexing the open connections, so they should be opened more or less evenly. The first explanation that comes to mind is that for some reason only (mainly) the sockets on this one event loop are utilized and the others are just idling around.

Can you please also share your CouchbaseEnvironment configuration?

For simplicity of this case I have 1 bucket with several different types of documents and web endpoint in java server based on Undertow. Request is very short-living, it makes 1 query by ID to Couchbase. So it’s 1-1. I do load testing by ApacheBench. Number of concurrent connections varies from 10 to 100-150. That’s quite close to expected production behavior I’m looking to see further down the road.

    public Observable<A> findA(String id) {
        return this.bucket.async().get(A.id(id), RawJsonDocument.class).map(this::readA);
    }
    private A readA(RawJsonDocument json) {
        return this.jsonMapper.readValue(json.content(), A.class);
    }

Later on I’ve added second Couchbase call from a request like this:

        Observable<A> aObservable = observable.flatMap(key -> this.repository.findA(request.a()));
        Observable<B> bObservable = observable.flatMap(f -> this.repository.findB(request.b()));
        Observable.zip(observable, aObservable, bObservable, C::zip)

That actually led to 2 IO threads being utilized!

Couchbase environment is quite standard, I’ve made io and computation pools configurable.

    @Override
    protected CouchbaseEnvironment getEnvironment() {
        return DefaultCouchbaseEnvironment.builder()
                .ioPoolSize(this.ioPoolSize)
                .computationPoolSize(this.computationPoolSize)
                .build();
    }

I’ve experimented with various pools sizes but that doesn’t change the picture. It’s probably worth to mention that Undertow has its own IO pool for serving connections in async manner.