Spike in high latency in query execution

Hi there.
I’m struggling to migrate sdk v2 code to sdk 3.0
I have a query that I run to fetch an item using some secondary key (unique key). This was done before with the sdk 2 implementation. latency were usually below 40ms. The query is simple and works on an index and accept this id as a parameter. The queries rate is about 1-3 per second. the process also involves kv operations, such as get entities and counter get/increment.
When running the same functionality with sdk 3 (3.0.4) I get after about 1 hour of running a latency spikes of about 1-3 seconds. there’s no rise in the rps, it justs starts to slow down. I activated metrics on this request and saw that server execution time is 10ms, even for these long requests. other services in our cluster which work with sdk 2 don’t experience any latency.
again, this always occur after many many successful requests. but then when latency starts, I get a lot of requests like that.
a note about the migration process: the original code with sdk 2.0 works with the dsl library you removed from sdk 3. I’m not sure it should have any effect.
I tried to play with maxHttpConnection but it didn’t seem to help in any way.
Do you have some advice on how solve it?

thanks,
Asher

@ashernave that’s definitely something we need to investigate. Would it be possible for you to attach a JFR and run it over the time period so we can see what happens when it works fine vs. when the latency becomes higher? Also one thing would be to check out debug logs and see what happens when it runs nicely vs. when it slows down.

Also, if there is some standalone code that reproduces it for you that I can try I’m happy to give it a spin locally.

@daschl
After several tests where I couldn’t reproduce on a sandbox environment wtih high request rate, I’ve found the scenario that causes this problem. it occurs when the request rate actually becomes very low. on our our server, the rate of this query starts at about 20ps but decreases over time and as I describe above drops to around 1rps.
probably then the connection closes (becomes idle) and the next request take a lot of time (i saw 2-6 seconds).
I can change the idle timeout but I want to be able to handle traffic spikes. This problem also happens when the server starts - is there a way to open query connections on startup and leave them open?

@ashernave so opening a http connection in your environment takes 2-6 seconds? Is that expected in your environment (what’s the latency between client and server?)

no. its maybe several milliseconds. the p99 for this query is normally 50ms. and we know the server execution metrics is about 10ms.

@ashernave ok then we need to figure out what the discrepancy there is (the cause) vs. just fixing the effect of keeping the connections open. Can you share debug logs of that environment where such a socket open happens, maybe there is something we can derive from it

yes, now that I know how to recreate. I’ll open couchbase sdk debug logs and lets sync.

@daschl
I have log export (filtered) regarding my connection issue. I would prefer to send it to direct e-mail if that’s possible.
can you send me your e-mail or some support e-mail that I can send it to?

Thanks,
Asher

@ashernave sent it to you via PM