Cannot connect kv to cluster behind NAT router

I have some code that works locally, but I’m trying to access it from a remote location and cannot get it to work.

I have two nodes in my cluster which externally share the same IP address (the IP address of my router). I have configured them to have distinct port ranges for 8091-8096 and 11210. I have configured both of them to have the appropriate external alternate address as well (the same for each of them).

I am able to connect to each of the servers from outside using the appropriate REST ports in my browser and everything looks good.

In my Java code, I appear to be successful in connecting and everything runs smoothly until my first operation - which times out. I suspect that in fact what is timing out is the bucket opening but because that is lazy it is only showing up when I cannot do the upsert.

This is about the simplest code I can use to reproduce the problem:

package test.ziniki.characterization.couchbase30;

import java.time.Duration;
import java.time.temporal.ChronoUnit;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Optional;
import java.util.Set;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.couchbase.client.core.env.IoConfig;
import com.couchbase.client.core.env.NetworkResolution;
import com.couchbase.client.core.env.SeedNode;
import com.couchbase.client.core.env.TimeoutConfig;
import com.couchbase.client.core.retry.BestEffortRetryStrategy;
import com.couchbase.client.java.Bucket;
import com.couchbase.client.java.Cluster;
import com.couchbase.client.java.ClusterOptions;
import com.couchbase.client.java.Collection;
import com.couchbase.client.java.env.ClusterEnvironment;
import com.couchbase.client.java.json.JsonObject;

public class MainCB30 {
  public static final Logger logger = LoggerFactory.getLogger("Test");
  
  public static void main(String[] argv) throws Exception {
    String host = System.getenv("COUCHBASE30_HOST");
    String pwd = System.getenv("CBTEST_PASSWORD");
    String bucketName = System.getenv("BUCKET");
    ClusterEnvironment env =
   	ClusterEnvironment.builder()
      .ioConfig(IoConfig.builder()
      .enableDnsSrv(false)
      .networkResolution(NetworkResolution.EXTERNAL))
//    .timeoutConfig(TimeoutConfig.builder().connectTimeout(Duration.of(60, ChronoUnit.SECONDS))
//    .kvTimeout(Duration.of(60, ChronoUnit.SECONDS)))
      .retryStrategy(BestEffortRetryStrategy.INSTANCE)
      .build();
    Set<SeedNode> seedNodes = new HashSet<>(Arrays.asList(
      SeedNode.create(host,
        Optional.of(42210),
        Optional.of(42091)),
      SeedNode.create(host,
        Optional.of(52210),
        Optional.of(52091))
    ));
    Cluster cluster = Cluster.connect(seedNodes, ClusterOptions.clusterOptions("test", pwd).environment(env));
    logger.info("connected");
    Bucket bucket = cluster.bucket(bucketName);
    Collection syncColl = bucket.defaultCollection();
    try {
      syncColl.upsert("exists", JsonObject.jo().put("exists", true));
      logger.info("done upsert");
    } finally {
      cluster.disconnect();
      logger.info("disconnected");
    }
  }
}

As you can see, I have experimented with extending the timeout but that just slows down the failure and produces larger log files :slight_smile:

I couldn’t attach the complete log file because it wasn’t an acceptable file type but I think the key information is contained in these lines:

20200212-20:23:08.065 .couchbase.service/ts DEBUG: [com.couchbase.service][ServiceAddedEvent][99ms] Service added to Node {"coreId":1,"remote":"external.ip.address:42210","type":"KV"}
20200212-20:23:08.067 .couchbase.request/ts DEBUG: [com.couchbase.request][RequestRetryScheduledEvent] Request CarrierGlobalConfigRequest retry scheduled per RetryStrategy (Reason: NODE_NOT_AVAILABLE) {"completed":false,"coreId":1,"idempotent":true,"requestId":1,"requestType":"CarrierGlobalConfigRequest","retried":6,"retryReasons":["NODE_NOT_AVAILABLE"],"service":{"opaque":"0x0","target":"external.ip.address","type":"kv"},"timeoutMs":10000}
20200212-20:23:08.068 .couchbase.request/ts DEBUG: [com.couchbase.request][RequestRetryScheduledEvent][1000us] Request CarrierGlobalConfigRequest retry scheduled per RetryStrategy (Reason: NODE_NOT_AVAILABLE) {"completed":false,"coreId":1,"idempotent":true,"requestId":1,"requestType":"CarrierGlobalConfigRequest","retried":6,"retryReasons":["NODE_NOT_AVAILABLE"],"service":{"opaque":"0x0","target":"external.ip.address","type":"kv"},"timeoutMs":10000}
20200212-20:23:08.068 com.couchbase.node/ts INFO: [com.couchbase.node][NodeConnectedEvent] Node connected {"coreId":1,"managerPort":"52091","remote":"external.ip.address"}
20200212-20:23:08.068 .couchbase.service/ts DEBUG: [com.couchbase.service][ServiceConnectInitiatedEvent] Starting to connect service with 1 minimum endpoints {"coreId":1,"remote":"external.ip.address:52210","type":"KV"}
20200212-20:23:08.068 couchbase.endpoint/ts DEBUG: [com.couchbase.endpoint][EndpointStateChangedEvent] Endpoint changed state from DISCONNECTED to CONNECTING {"circuitBreaker":"DISABLED","coreId":1,"remote":"external.ip.address:52210","type":"KV"}
20200212-20:23:08.068 .couchbase.service/ts DEBUG: [com.couchbase.service][ServiceStateChangedEvent] Service changed state from DISCONNECTED to CONNECTING {"coreId":1,"remote":"external.ip.address:52210","type":"KV"}
20200212-20:23:08.069 com.couchbase.node/ts DEBUG: [com.couchbase.node][NodeStateChangedEvent] Node changed state from DISCONNECTED to CONNECTING {"coreId":1,"managerPort":"52091","remote":"external.ip.address"}
20200212-20:23:08.069 .couchbase.service/ts DEBUG: [com.couchbase.service][ServiceAddedEvent][705us] Service added to Node {"coreId":1,"remote":"external.ip.address:52210","type":"KV"}
20200212-20:23:08.069 .couchbase.request/ts DEBUG: [com.couchbase.request][RequestRetryScheduledEvent] Request CarrierGlobalConfigRequest retry scheduled per RetryStrategy (Reason: NODE_NOT_AVAILABLE) {"completed":false,"coreId":1,"idempotent":true,"requestId":2,"requestType":"CarrierGlobalConfigRequest","retried":6,"retryReasons":["NODE_NOT_AVAILABLE"],"service":{"opaque":"0x1","target":"external.ip.address","type":"kv"},"timeoutMs":10000}
20200212-20:23:08.069 .couchbase.request/ts DEBUG: [com.couchbase.request][RequestRetryScheduledEvent][2000us] Request CarrierGlobalConfigRequest retry scheduled per RetryStrategy (Reason: NODE_NOT_AVAILABLE) {"completed":false,"coreId":1,"idempotent":true,"requestId":1,"requestType":"CarrierGlobalConfigRequest","retried":6,"retryReasons":["NODE_NOT_AVAILABLE"],"service":{"opaque":"0x0","target":"external.ip.address","type":"kv"},"timeoutMs":10000}
20200212-20:23:08.070 .couchbase.request/ts DEBUG: [com.couchbase.request][RequestRetryScheduledEvent][1000us] Request CarrierGlobalConfigRequest retry scheduled per RetryStrategy (Reason: NODE_NOT_AVAILABLE) {"completed":false,"coreId":1,"idempotent":true,"requestId":2,"requestType":"CarrierGlobalConfigRequest","retried":6,"retryReasons":["NODE_NOT_AVAILABLE"],"service":{"opaque":"0x1","target":"external.ip.address","type":"kv"},"timeoutMs":10000}
20200212-20:23:08.070 .couchbase.service/ts DEBUG: [com.couchbase.service][ServiceConnectInitiatedEvent] Starting to connect service with 1 minimum endpoints {"bucket":"bucket-name","coreId":1,"remote":"external.ip.address:42210","type":"KV"}
20200212-20:23:08.070 couchbase.endpoint/ts DEBUG: [com.couchbase.endpoint][EndpointStateChangedEvent] Endpoint changed state from DISCONNECTED to CONNECTING {"bucket":"bucket-name","circuitBreaker":"DISABLED","coreId":1,"remote":"external.ip.address:42210","type":"KV"}
20200212-20:23:08.070 .couchbase.service/ts DEBUG: [com.couchbase.service][ServiceStateChangedEvent] Service changed state from DISCONNECTED to CONNECTING {"bucket":"bucket-name","coreId":1,"remote":"external.ip.address:42210","type":"KV"}
20200212-20:23:08.070 .couchbase.service/ts DEBUG: [com.couchbase.service][ServiceAddedEvent][885us] Service added to Node {"bucket":"bucket-name","coreId":1,"remote":"external.ip.address:42210","type":"KV"}
20200212-20:23:08.070 .couchbase.request/ts DEBUG: [com.couchbase.request][RequestRetryScheduledEvent] Request CarrierBucketConfigRequest retry scheduled per RetryStrategy (Reason: NODE_NOT_AVAILABLE) {"completed":false,"coreId":1,"idempotent":true,"requestId":3,"requestType":"CarrierBucketConfigRequest","retried":6,"retryReasons":["NODE_NOT_AVAILABLE"],"service":{"bucket":"bucket-name","collection":"_default","opaque":"0x2","scope":"_default","target":"external.ip.address","type":"kv"},"timeoutMs":10000}
20200212-20:23:08.071 .couchbase.service/ts DEBUG: [com.couchbase.service][ServiceConnectInitiatedEvent] Starting to connect service with 1 minimum endpoints {"bucket":"bucket-name","coreId":1,"remote":"external.ip.address:52210","type":"KV"}
20200212-20:23:08.071 couchbase.endpoint/ts DEBUG: [com.couchbase.endpoint][EndpointStateChangedEvent] Endpoint changed state from DISCONNECTED to CONNECTING {"bucket":"bucket-name","circuitBreaker":"DISABLED","coreId":1,"remote":"external.ip.address:52210","type":"KV"}
20200212-20:23:08.071 .couchbase.service/ts DEBUG: [com.couchbase.service][ServiceStateChangedEvent] Service changed state from DISCONNECTED to CONNECTING {"bucket":"bucket-name","coreId":1,"remote":"external.ip.address:52210","type":"KV"}
20200212-20:23:08.071 .couchbase.request/ts DEBUG: [com.couchbase.request][RequestRetryScheduledEvent][4000us] Request CarrierGlobalConfigRequest retry scheduled per RetryStrategy (Reason: NODE_NOT_AVAILABLE) {"completed":false,"coreId":1,"idempotent":true,"requestId":1,"requestType":"CarrierGlobalConfigRequest","retried":6,"retryReasons":["NODE_NOT_AVAILABLE"],"service":{"opaque":"0x0","target":"external.ip.address","type":"kv"},"timeoutMs":10000}
20200212-20:23:08.071 .couchbase.request/ts DEBUG: [com.couchbase.request][RequestRetryScheduledEvent][2000us] Request CarrierGlobalConfigRequest retry scheduled per RetryStrategy (Reason: NODE_NOT_AVAILABLE) {"completed":false,"coreId":1,"idempotent":true,"requestId":2,"requestType":"CarrierGlobalConfigRequest","retried":6,"retryReasons":["NODE_NOT_AVAILABLE"],"service":{"opaque":"0x1","target":"external.ip.address","type":"kv"},"timeoutMs":10000}
20200212-20:23:08.071 .couchbase.request/ts DEBUG: [com.couchbase.request][RequestRetryScheduledEvent][1000us] Request CarrierBucketConfigRequest retry scheduled per RetryStrategy (Reason: NODE_NOT_AVAILABLE) {"completed":false,"coreId":1,"idempotent":true,"requestId":3,"requestType":"CarrierBucketConfigRequest","retried":6,"retryReasons":["NODE_NOT_AVAILABLE"],"service":{"bucket":"bucket-name","collection":"_default","opaque":"0x2","scope":"_default","target":"external.ip.address","type":"kv"},"timeoutMs":10000}
20200212-20:23:08.071 .couchbase.service/ts DEBUG: [com.couchbase.service][ServiceAddedEvent][11ms] Service added to Node {"bucket":"bucket-name","coreId":1,"remote":"external.ip.address:52210","type":"KV"}
20200212-20:23:08.072 .couchbase.request/ts DEBUG: [com.couchbase.request][RequestRetryScheduledEvent] Request CarrierBucketConfigRequest retry scheduled per RetryStrategy (Reason: NODE_NOT_AVAILABLE) {"completed":false,"coreId":1,"idempotent":true,"requestId":4,"requestType":"CarrierBucketConfigRequest","retried":5,"retryReasons":["NODE_NOT_AVAILABLE"],"service":{"bucket":"bucket-name","collection":"_default","opaque":"0x3","scope":"_default","target":"external.ip.address","type":"kv"},"timeoutMs":10000}

Actually, I’m not sure whether this has anything to do with the NAT itself, or just the port allocation.

I’ve tried doing exactly the same thing on the local network and that fails with what appears to be exactly the same problem (NODE_NOT_AVAILABLE):

20200213-08:51:46.165 .couchbase.request/ts DEBUG: [com.couchbase.request][RequestRetryScheduledEvent][1000us] Request CarrierGlobalConfigRequest retry scheduled per RetryStrategy (Reason: NODE_NOT_AVAILABLE) {"completed":false,"coreId":1,"idempotent":true,"requestId":2,"requestType":"CarrierGlobalConfigRequest","retried":4,"retryReasons":["NODE_NOT_AVAILABLE"],"service":{"opaque":"0x1","target":"172.21.141.42","type":"kv"},"timeoutMs":10000}

Even so, I’m not really sure how to proceed with diagnosing this problem.

I realized this afternoon that I had specifically requested external network resolution in this code. I believe that is contributing to the problem I have had running it locally but I haven’t had the opportunity to nail it down and I’m not now sure when I will. Probably next week sometime.

Any time you’re having connectivity issues, it’d be good to diagnostically check for lower level details with SDK doctor. Maybe try running it in that env? It may not find anything, but it will validate that common connectivity problems aren’t there.

You’ll find pre-built binaries on the release page.

Usually the summary at the end is pretty easy to interpret. If you need help with interpretation, please feel free to post it or a link to it here.

Thanks!

That was easy enough that I was able to try it from my phone on the train here.

I think it stalled though. I tried it on both nodes and it hit a roadblock at the same point.

13:42:33.063 INFO ▶ Parsing connection string `couchbase://172.21.141.42:42091/ziniki-unit`
13:42:33.063 INFO ▶ Connection string identifies the following CCCP endpoints:
13:42:33.063 INFO ▶   1. 172.21.141.42:42091
13:42:33.063 INFO ▶ Connection string identifies the following HTTP endpoints:
13:42:33.063 INFO ▶ Connection string specifies bucket `ziniki-unit`
13:42:33.063 WARN ▶ Your connection string specifies only a single host.  You should consider adding additional static nodes from your cluster to this list to improve your applications fault-tolerance
13:42:33.063 INFO ▶ Performing DNS lookup for host `172.21.141.42`
13:42:33.063 INFO ▶ Attempting to connect to cluster via CCCP
13:42:33.063 INFO ▶ Attempting to fetch config via cccp from `172.21.141.42:42091`

I’m not sure if that should tell me something or not.

Interestingly, I can get further by replacing the couchbase: with http:. It complains that I shouldn’t have done that, but otherwise seems to work OK, with either the internal or external address.

Here is the output from the HTTP run. Interestingly, in spite of the fact that I was local, it said it was selecting mode “external” and returned external addresses.

Am I right in thinking that the CCCP string still points at the management port, or should I be directing it to a different port? Am I also right in thinking that it doesn’t want to be encrypted unless I specify couchbases://…?

17:07:48.907 INFO ▶ Parsing connection string `http://172.21.141.52:52091/ziniki-unit`
17:07:48.907 WARN ▶ Connection string is using the deprecated `http://` scheme.  Use the `couchbase://` scheme instead!
17:07:48.907 INFO ▶ Connection string identifies the following CCCP endpoints:
17:07:48.907 INFO ▶ Connection string identifies the following HTTP endpoints:
17:07:48.907 INFO ▶   1. 172.21.141.52:52091
17:07:48.907 INFO ▶ Connection string specifies bucket `ziniki-unit`
17:07:48.907 WARN ▶ Your connection string specifies only a single host.  You should consider adding additional static nodes from your cluster to this list to improve your applications fault-tolerance
17:07:48.907 INFO ▶ Performing DNS lookup for host `172.21.141.52`
17:07:48.907 INFO ▶ Not attempting CCCP, as the connection string does not support it
17:07:48.907 INFO ▶ Attempting to connect to cluster via HTTP (Terse)
17:07:48.907 INFO ▶ Attempting to fetch terse config via http from `172.21.141.52:52091`
17:07:48.909 INFO ▶ Selected the following network type: external
17:07:48.909 INFO ▶ Identified the following nodes:
17:07:48.909 INFO ▶   [0] external.ip.address
17:07:48.909 INFO ▶                     n1ql: 42093,            indexHttp:  9102,     indexStreamMaint:  9105
17:07:48.909 INFO ▶               indexHttps: 19102,                kvSSL: 11207,              capiSSL: 18092
17:07:48.909 INFO ▶               indexAdmin:  9100,            projector:  9999,              n1qlSSL: 18093
17:07:48.909 INFO ▶                     mgmt: 42091,        eventingDebug:  9140,          eventingSSL: 18096
17:07:48.909 INFO ▶       indexStreamCatchup:  9104,                 capi: 42092,              mgmtSSL: 18091
17:07:48.909 INFO ▶        eventingAdminPort: 42096,            indexScan:  9101,      indexStreamInit:  9103
17:07:48.909 INFO ▶                       kv: 42210
17:07:48.909 INFO ▶   [1] external.ip.address
17:07:48.909 INFO ▶                     capi: 52092,              n1qlSSL: 18093,              cbasSSL: 18095
17:07:48.909 INFO ▶            eventingDebug:  9140,                  fts: 52094,          eventingSSL: 18096
17:07:48.909 INFO ▶                indexScan:  9101,      indexStreamInit:  9103,                 n1ql: 52093
17:07:48.909 INFO ▶                  mgmtSSL: 18091,                 cbas: 52095,    eventingAdminPort: 52096
17:07:48.909 INFO ▶       indexStreamCatchup:  9104,                 mgmt: 52091,               ftsSSL: 18094
17:07:48.909 INFO ▶               indexAdmin:  9100,     indexStreamMaint:  9105,           indexHttps: 19102
17:07:48.909 INFO ▶                       kv: 52210,                kvSSL: 11207,              capiSSL: 18092
17:07:48.909 INFO ▶                  ftsGRPC:  9130,           ftsGRPCSSL: 19130,            indexHttp:  9102
17:07:48.909 INFO ▶                projector:  9999
17:07:48.909 WARN ▶ Your configuration was fetched via a non-optimal path, you should update your connection string and/or cluster configuration to allow CCCP config fetch
17:07:48.909 INFO ▶ Fetching config from `external.ip.address:42091`
17:07:48.919 INFO ▶ Failed to retreive cluster information (status code: 401)
17:07:48.928 INFO ▶ Successfully connected to KV service at `external.ip.address:42210`
17:07:48.935 INFO ▶ Successfully connected to MGMT service at `external.ip.address:42091`
17:07:48.943 INFO ▶ Successfully connected to CAPI service at `external.ip.address:42092`
17:07:48.952 INFO ▶ Successfully connected to N1QL service at `external.ip.address:42093`
17:07:48.960 INFO ▶ Successfully connected to KV service at `external.ip.address:52210`
17:07:48.969 INFO ▶ Successfully connected to MGMT service at `external.ip.address:52091`
17:07:48.976 INFO ▶ Successfully connected to CAPI service at `external.ip.address:52092`
17:07:48.982 INFO ▶ Successfully connected to N1QL service at `external.ip.address:52093`
17:07:48.988 INFO ▶ Successfully connected to FTS service at `external.ip.address:52094`
17:07:49.009 INFO ▶ Memd Nop Pinged `external.ip.address:42210` 10 times, 0 errors, 0ms min, 1ms max, 1ms mean
17:07:49.025 INFO ▶ Memd Nop Pinged `external.ip.address:52210` 10 times, 0 errors, 0ms min, 0ms max, 0ms mean
17:07:49.025 INFO ▶ Diagnostics completed

Summary:
[WARN] Connection string is using the deprecated `http://` scheme.  Use the `couchbase://` scheme instead!
[WARN] Your connection string specifies only a single host.  You should consider adding additional static nodes from your cluster to this list to improve your applications fault-tolerance
[WARN] Your configuration was fetched via a non-optimal path, you should update your connection string and/or cluster configuration to allow CCCP config fetch

Found multiple issues, see listing above.

It seems that I am using the wrong port for sdk-doctor. I ran it today and both from inside and outside my NAT it works just fine if I specify the KV port (42210 in my case, 11210 usually).

I’m not quite sure where that leaves me, but I will hopefully get back to what I was really trying to do sometime this week.

Hi, @gareth!
Do you have any success so far?
It looks like I’m facing a pretty similar issue.
The difference is I run couchbase in a docker container and try to reach it from host. I’m trying to follow that advice Make the ports 8092, 8093, 8094 and 8095 configurable to be able to use docker's random ports
And my logs (for java sdk client and sdk doctor) look like yours.
I would be grateful if you could share any information.

I do, but I’m not sure if it will help you.

I configured another node using the UI (I had previously configured nodes using a script using the REST API) and seem to have things working (although I haven’t been scientific enough in my testing, I’m sorry to say).

I compared the IP traffic in wireshark in extreme detail and the problem seems to be that the node that is failing is configured with an explicit IP address, whereas when you configure using the UI it configures it using some meta-address which travels in the wireshark packets as “$HOST”. I think this is then translated by the client to “the IP address where this packet came from”.

I haven’t yet figured out how to replicate that magic from the REST API.

Hope some of that helps.

Thank you!
And if you don’t mind one more question :slight_smile:
How could you manage to configure node using the UI ? I looked through all options but couldn’t find it.
Could you please highlight it.
Or maybe I just don’t understand what configuration you mean.

First off, I was able to do some more testing yesterday and while it seems superficially better, I’m afraid it still doesn’t work :frowning:

To clarify what I was saying, you can’t (as far as I know) configure the ports through the UI; I was talking about the initial setup of the cluster. Normally, I do this through a script (because I stand them up and tear them down a lot in a cloud environment), but that appears to give different results to configuring the cluster through the UI.

I’m afraid I’m back to being very confused. Sorry if that is not very helpful.

That’s sad to hear. :frowning_face:

Hello everyone, I’m also facing the same issue after upgrading java sdk from 2.7 to 3.2.2. Is there a solution to it?