Can you tell me which problem you are seeing? Also can you identify where the JNI references are coming from? In general though I don’t see a problem why JNI would be used.
We are facing issues on load testing and I am currently looking at heap dumps, gc logs, system logs etc available at my disposal and here is a list of items/problems found.
Clients crash with OOM. Large number of JNI references found in histo logs (as described earlier). Please do let me know if SDK is using any JNI references causing memory leaks.
“Operations over threshold” warning seen with total_us > 7 secs
[
{
“top”:[
{
“operation_name”:“get”,
“last_local_id”:“5C2AEB30919DC5A7/FFFFFFFF853D2B29”,
“last_local_address”:“10.64.105.94:42010”,
“last_remote_address”:“10.64.106.184:11210”,
“last_dispatch_us”:734,
“decode_us”:70,
“last_operation_id”:“0x5034a55”,
“total_us”:7871814
}
],
“service”:“kv”,
“count”:1
}
]
No GC were running at the moment when the above warning was seen.
lsof shows huge number of connections from 1 host to couchbase server (upto 60k), all in ESTABLISHED state. No network errors are seen.
REBALANCING on Couchbase server takes 12~15 minutes (even with empty 9 ephemeral buckets). We are running only dataservice on all the nodes and this duration (12~15 min) is not reasonable.
We are running SDK CouchbaseEnvironment with default values.
Any inputs/suggestions will be greatly appreciated.
@ravikrn.13 the only way we use JNI is through netty, our IO layer. This might correspond with your observation here
lsof shows huge number of connections from 1 host to couchbase server (upto 60k), all in ESTABLISHED state. No network errors are seen.
Can you verify that those connections are coming from the jvm? Are you using environment, cluster and bucket as a singleton? It is very important to not open a new connection every time you do an op, which can lead to this behavior. Can you share DEBUG level logs for the entire duration of the run?