Endpoint not writeable

ashernave · May 17, 2020, 9:53am

Another error that popped up after I migrated a service to use java sdk 3.0:
We do intensive usage of atomic counters. I have a class that does get, getAndTouch and increment.
After a while running the service (X100 request per seconds) I’m getting Timeout exception with reason Endpoint Not Writable.
The exact logs are:

[com.couchbase.tracing][OverThresholdRequestsRecordedEvent][10s] Requests over Threshold found: [{“top”:[{“operation_name”:“GetRequest”,“server_us”:0,“last_local_id”:“D6021CA500000001/0000000016FCC870”,“last_local_address”:“10.109.35.160:32284”,“last_remote_address”:“10.108.0.96:11207”,“last_dispatch_us”:815997,“last_operation_id”:“0x10870b3”,“total_us”:1926025},{“operation_name”:“GetRequest”,“server_us”:0,“last_local_id”:“D6021CA500000001/0000000016FCC870”,“last_local_address”:“10.109.35.160:32284”,“last_remote_address”:“10.108.0.96:11207”,“last_dispatch_us”:815991,“last_operation_id”:“0x10870b2”,“total_us”:1926022}

and then:

[com.couchbase.config][BucketConfigRefreshFailedEvent] Reason: INDIVIDUAL_REQUEST_FAILED, Type: KV, Cause: com.couchbase.client.core.error.RequestCanceledException: CarrierBucketConfigRequest {“cancelled”:true,“completed”:true,“coreId”:“0xd6021ca500000001”,“idempotent”:true,“reason”:“NO_MORE_RETRIES (ENDPOINT_NOT_WRITABLE)”,“requestId”:17455328,“requestType”:“CarrierBucketConfigRequest”,“retried”:0,“service”:{“bucket”:“ad-stats”,“collection”:“_default”,“opaque”:“0x10a439b”,“scope”:“_default”,“target”:“10.108.0.96”,“type”:“kv”},“timeoutMs”:2500} {“coreId”:“0xd6021ca500000001”}

from this point all requests fail

Note that I tried with several kv circuit breakers definition, disable circuit breaker at all, with several retry strategies (best effort and fail fast)
Previous code with sdk 2 worked fine.

My question is what can cause this issue and how can I avoid it?

Thanks,
Asher

daschl · May 18, 2020, 6:36am

By default circuit breakers are disabled. From your previous post I saw that you are setting the FailFastStrategy - are you setting it by default on the environment? Note that if so it is not supported anymore and marked as internal. What is the motivation behind using it?

Actually separately, can you share debug logs (maybe via PN) when the issues are happening?

ashernave · May 18, 2020, 11:28am

I set the retry strategy on the incrementOptions/getOptions, not on environment config, and I actually tested bestEffortStrategy, FailFastStrategy and also custom one. all ended with the same results. I can’t share the logs, since it is a production machine. The logs you see above are exactly the moment when the endpoint not writable starts. I can’y understand why I get so many retries and why I even get the timeout. I also played with maxKvConnections (not sure what the number should be).

Topic		Replies	Views
Endpoint_not_writable Couchbase Server	5	425	March 3, 2024
There is a problem when i using java sdk 2.1.x Java SDK	11	3647	May 7, 2015
Cluster, ConnectTimeoutException for not connect to endpoint Couchbase Server	2	2800	December 16, 2016
Timeout errors since upgrading Java SDK Java SDK connections	2	1612	June 21, 2017
Intemitant couchbase upsert failure Java SDK	7	3744	September 17, 2019

Endpoint not writeable

Related topics