Issue with reactive bulk get operation

himanshu.mps · July 12, 2018, 8:08am

Hi,

I am seeing issues while doing a bulk get using reactive code. This is the code

List<JsonDocument> documents =Observable.from(subList)
			.flatMap(new Func1<String, Observable<JsonDocument>>() {
				@Override
				public Observable<JsonDocument> call(String metaId) {
					if(metaId != null) {
						System.out.println("Sending: " + metaId);
						return asyncBucket.get(metaId);
					}
					return Observable.empty();
				}
			},100)
			.doOnNext(t -> {
				System.out.println("Received: " + t.id());
			})
			.toList()
			.toBlocking()
			.single();

The sublist contains 100 elements. While the sending is showing that all the request for 100 items have been sent which are there in database, I only get response for 90 elements (always) and then I get this warning:

Jul 12, 2018 1:35:26 PM com.couchbase.client.core.endpoint.AbstractGenericHandler$KeepAliveResponseAction onError
WARNING: [/127.0.0.1:8093][QueryEndpoint]: Got error while consuming KeepAliveResponse.
java.util.concurrent.TimeoutException
	at rx.internal.operators.OnSubscribeTimeoutTimedWithFallback$TimeoutMainSubscriber.onTimeout(OnSubscribeTimeoutTimedWithFallback.java:166)
	at rx.internal.operators.OnSubscribeTimeoutTimedWithFallback$TimeoutMainSubscriber$TimeoutTask.call(OnSubscribeTimeoutTimedWithFallback.java:191)
	at rx.internal.schedulers.EventLoopsScheduler$EventLoopWorker$2.call(EventLoopsScheduler.java:189)
	at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Am I doing something incorrectly ?

I am using java-client 2.5.9 and rx-java 1.3.8 and here is my couchbase configuration details:

INFO: CouchbaseEnvironment: {sslEnabled=false, sslKeystoreFile='null', sslTruststoreFile='null', sslKeystorePassword=false, sslTruststorePassword=false, sslKeystore=null, sslTruststore=null, bootstrapHttpEnabled=true, bootstrapCarrierEnabled=true, bootstrapHttpDirectPort=8091, bootstrapHttpSslPort=18091, bootstrapCarrierDirectPort=11210, bootstrapCarrierSslPort=11207, ioPoolSize=3, computationPoolSize=3, responseBufferSize=16384, requestBufferSize=16384, kvServiceEndpoints=1, viewServiceEndpoints=12, queryServiceEndpoints=12, searchServiceEndpoints=12, configPollInterval=2500, configPollFloorInterval=50, ioPool=NioEventLoopGroup, kvIoPool=null, viewIoPool=null, searchIoPool=null, queryIoPool=null, coreScheduler=CoreScheduler, memcachedHashingStrategy=DefaultMemcachedHashingStrategy, eventBus=DefaultEventBus, packageNameAndVersion=couchbase-java-client/2.5.9 (git: 2.5.9, core: 1.5.9), retryStrategy=BestEffort, maxRequestLifetime=75000, retryDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=100, upper=100000}, reconnectDelay=ExponentialDelay{growBy 1.0 MILLISECONDS, powers of 2; lower=32, upper=4096}, observeIntervalDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=10, upper=100000}, keepAliveInterval=30000, continuousKeepAliveEnabled=true, keepAliveErrorThreshold=4, keepAliveTimeout=2500, autoreleaseAfter=75000, bufferPoolingEnabled=true, tcpNodelayEnabled=true, mutationTokensEnabled=false, socketConnectTimeout=30000, callbacksOnIoPool=false, disconnectTimeout=25000, requestBufferWaitStrategy=com.couchbase.client.core.env.DefaultCoreEnvironment$2@43b6a5a6, certAuthEnabled=false, coreSendHook=null, forceSaslPlain=false, queryTimeout=75000, viewTimeout=75000, searchTimeout=75000, analyticsTimeout=75000, kvTimeout=2500, connectTimeout=30000, dnsSrvEnabled=false}

himanshu.mps · July 18, 2018, 6:30am

Fixed it by using RetryHandler

ingenthr · July 18, 2018, 6:39am

If it’s just taking a while every once in a while, you might try setting a higher timeout for the operation instead. That’d be less work on the system than retrying things that timeout. Any other thoughts @subhashni or @daschl?

himanshu.mps · July 19, 2018, 12:49pm

@subhashni @daschl @ingenthr

Hi,

I used 2.6.0 and here are the opentracing logs while connecting to localhost couchbase instance:

Can you suggest what might be causing the issue. I am doing a bulk get of 500 ID’s from couchbase using reactive api and it is taking 3297 milliseconds.

daschl · July 19, 2018, 1:15pm

Thanks for providing the log, helpful to look at.

One thing you need to consider is that even if you are fetching 100 in parallel, technically you are not because you only connect to localhost (1 machine) over 1 socket. So they are going to be serialized over the network, even if the JVM code is non-blocking. Can you try your code on a cluster of lets say 3 nodes with the kv service enabled and see if it performs better? You could also, for a test environment, try to increase the kvEndpoints to 2 or 3 and see if it makes a difference. But note that I recommend this only for the 1 localhost environment, for prod I think adding more nodes will help here too.

Also note that some operations really take a loong time to complete, i.e.

{
“operation_name” : “get”,
“server_us” : 11,
“last_local_id” : “7E661647ABBEAB37/0000000018E10958”,
“last_local_address” : “127.0.0.1:51526”,
“last_remote_address” : “127.0.0.1:11210”,
“last_dispatch_us” : 59930,
“last_operation_id” : “0x7”,
“total_us” : 3171838
}

It spent 60ms in the server & network (which on localhost is negligible) but then it took 3 171 838 to complete. So that’s over 3 seconds. And the server reported 11 micros vs 60ms dispatch time… All this points to the fact that your JVM and/or OS might be overloaded - doing GC and whatnot. Have you done some profiling of the system and/or running with gc logs enabled to exclude that?

himanshu.mps · July 19, 2018, 1:27pm

I am getting the similar results while running it on a cluster of 3 nodes. If you want I can deploy it as a docker application and share the results when it is run in a cluster. It will take a while to do that but let me share those results too as we are seeing the same behaviour there too. I will also attach the code so that you can check if I am doing something incorrectly.

himanshu.mps · July 20, 2018, 6:13am

Shared the logs in the couchbase forum inbox of yours.

Topic		Replies	Views
Reactive code issue Java SDK java	2	1997	July 23, 2018
The problem of bulk get Java SDK	12	3769	May 13, 2015
Bucket get operations causing inconsistency and timeout exception Java SDK java	1	1319	August 1, 2019
Problem with timeouts, eventually Java SDK java	8	2734	April 13, 2017
getBulk performance issue Java SDK	6	3535	March 30, 2015

Issue with reactive bulk get operation

Related topics