.NET SDK 3.3.1 Throws InvalidCastException (KetamaNode to VBucket)

Hi,

The 3.3.1 SDK started throwing InvalidCastException with the following message:

Unable to cast object of type ‘Couchbase.Core.Sharding.KetamaNode’ to type ‘Couchbase.Core.Sharding.VBucket’

at Couchbase.KeyValue.CouchbaseCollection.GetAnyReplicaAsync(String id, GetAnyReplicaOptions options)

Sounds like this line in the GetAnyReplicaAsync causes the exception, but don’t have an idea how it can get into that state…

var vBucket = (VBucket) _bucket.KeyMapper!.MapKey(id);

Could someone please take a look at this?

UPDATE

The state of the cluster was the following:

  • we had a failed node with installed Couchbase server
  • Failover operation occurred and the node was taken out of the load

@eugene-shcherbo

Is this for a Memcached bucket? KetamaNode only applies to Memcached buckets, and those buckets don’t have replicas. So either you are using an unsupported API for the bucket type (which could probably use a better error message), or there’s some problem happening in bootstrap causing it to misclassify the bucket.

@btburnett3

No, this is for Couchbase bucket. This is def something happening at bootstrap, but not sure what might cause the process to misclassify the bucket type.

Looks like this may have been fixed in 3.3.2? Can you try upgrading?

1 Like

@btburnett3

Yeah, thank you. I saw this, but first of all I want to have a deterministic way to reproduce this. Then I can check if it’s fixed in 3.3.2 :slight_smile:

Tried this with some simple scenarios (like breaking a node, breaking network etc), but hasn’t succeeded yet.

From the notes, it appears it can happen in a mixed state cluster (some nodes on 6.5 and some on 7.0) where CCCP fails so it falls back to HTTP bootstrap. I guess there may be other scenarios where it falls back to HTTP bootstrap and has the same issue.

@jmorris Do you have any ideas?

@btburnett3

This is reproduced consistently after doing a graceful failover of a node. Will try the same steps with 3.3.2 and get back with the results.

@eugene-shcherbo @btburnett3 Interesting to see if this is resolved in 3.3.2, if not we can create a ticket and get it resolved.

Jeff

1 Like

@jmorris @btburnett3 Looks like it has been resolved in 3.3.2, thank you guys.

Seems like I can’t update the original post any more, so will add more info about the cluster state in this reply:

The state of the cluster was the following:

  • we had a failed node with installed Couchbase server
  • Failover operation occurred and the node was taken out of the load
  • The node was in the DNS SRV records (this was a root cause of the problem)