.NET SDK 3.3.X - Intermittent bootstrap error

Hello,

We’ve been using the newer 3.3.X SDK (specifcally 3.3.4) for awhile, and have not had any issues of note.

Recently on one of our environments, we started to get bootstrap errors after deploying a new version of our service.

To investigate deeper, I used a tool to connect to CB in the problematic environment, and it failed once out of 4 tries (success, fail, success, success). The good log vs bad looks like this:

Good:

Logger - EnableTls is set to True
Logger - ForceIpAsTargetHost is set to False
Couchbase.Core.IO.Connections.Channels.ChannelConnectionPool - Using the ChannelConnectionPool.
Couchbase.Core.ClusterContext - NetworkResolution [default] using default ------
Couchbase.Core.ClusterContext - Bootstrapping: initializing a global non-bootstrap node [-------]
Couchbase.Query.QueryClient - Enabling Enhanced Prepared Statements
Logger - Opening Couchbase bucket: ‘-----’
Couchbase.Core.ClusterContext - Bootstrapping: created a membase bucket for --------.

Bad:

Logger - EnableTls is set to True
Logger - ForceIpAsTargetHost is set to False
Couchbase.Core.IO.Connections.Channels.ChannelConnectionPool - Using the ChannelConnectionPool.
Logger - Opening Couchbase bucket: ‘-----’
Logger - Could not initialize Couchbase cluster and/or ----- bucket: Cluster has not yet bootstrapped. Call WaitUntilReadyAsync(…) to wait for it to complete.

Our CB server in this environment did have some issues last week (bombarded with load and wasn’t sized/couldn’t handle it), but we thought we fixed things up.

Any ideas what could be wrong / what to check? Is the CB SDK unable to create/get a connection from the pool? If so, how is this tied to the CB server?

Btw, I noticed that when the bootstrap error occurred, the memory kept rising in our service until it crashed/restarted. I don’t know if this is how we have the service implemented or whether it’s something in the CB SDK. Might be something to look at though.

NOTE: We also have tested on the latest 3.3.6 and the same bootstrap error comes up in the bad environment (but not in the good environment).

@obawin

From your description it sounds environment related (as you mentioned). My guess is that it’s a connectivity issue. The next steps from here would be use SDK Doctor and/or SDK logging at the DEBUG level to see what is going wrong.

This should probably be investigated to determine what is the cause. There are currently no other reports of memory issues when bootstrapping, but either way probably best to see what is going on SDK or in your app.

Thanks for the suggestions. I enabled Debug logging and we were able to trace the problem down to a incorrect/bad IP that was resolving to the CB host (so was indeed an environment/config issue).

Question: Is it expected that the bootstrapping should fail if any one of the IP’s in a multi-IP configuration is bad? We have 3 nodes and the CB hostname resolved to 2 good IP’s and 1 bad IP.

1 Like

No, the client should try the next node in the list. It could be that it was timing out possibly?

Yes, possibly.

We are investigating what exactly caused this bad IP (maybe DNS caching or something), but looks like the service code and CB SDK are ok.

Thanks for the insights!