Hi,
The 3.3.1 SDK started throwing InvalidCastException
with the following message:
Unable to cast object of type ‘Couchbase.Core.Sharding.KetamaNode’ to type ‘Couchbase.Core.Sharding.VBucket’
at Couchbase.KeyValue.CouchbaseCollection.GetAnyReplicaAsync(String id, GetAnyReplicaOptions options)
Sounds like this line in the GetAnyReplicaAsync
causes the exception, but don’t have an idea how it can get into that state…
var vBucket = (VBucket) _bucket.KeyMapper!.MapKey(id);
Could someone please take a look at this?
UPDATE
The state of the cluster was the following:
we had a failed node with installed Couchbase server
Failover operation occurred and the node was taken out of the load
@eugene-shcherbo
Is this for a Memcached bucket? KetamaNode
only applies to Memcached buckets, and those buckets don’t have replicas. So either you are using an unsupported API for the bucket type (which could probably use a better error message), or there’s some problem happening in bootstrap causing it to misclassify the bucket.
@btburnett3
No, this is for Couchbase bucket. This is def something happening at bootstrap, but not sure what might cause the process to misclassify the bucket type.
Looks like this may have been fixed in 3.3.2? Can you try upgrading?
committed 10:22PM - 16 May 22 UTC
Motivation
----------
The reason for the NMVB is that the SDK thinks its connect… ing to a Memcached
bucket and is trying to use Ketama hashing instead of VBucket hashing. In a
mixed state it appears that the CCCP calls fails on the server side and when
the client degrades to HTTP streaming this happens.
Modifications
-------------
- Refactor CreateAndBootstrapAsync in ClusterContext to determine the
bucket type by checking Config.BucketCapabilities
- The initial config fetch is now done in CreateAndBootStrapAsync and
pased via ctor to each bucket type.
- Add IHttpClusterMapFactory and impl; wire these into DI so they
resolve when the using classes are instantiated.
- Improve logging and log redaction in ClusterContext to make it easier
to analyze logs at INFO level and below.
- Update parameters that accept an IBucket to just use the name as the
IBucket reference may not have been created yet.
- Remove BucketType dependence when possible
- Make MemcachedBucket take a IHttpClusterMapFactory to generate
BucketConfigs internally (outside of the standerd pub/sub loop.
- Fixup unit tests
Result
------
Bucket creation is now driven by the ClusterCapabilities which limits
the possibility of the wrong bucket type being created.
Change-Id: I205d3c82315bed995982d78954e5196ad6d1e035
Reviewed-on: https://review.couchbase.org/c/couchbase-net-client/+/174784
Tested-by: Build Bot <build@couchbase.com>
Reviewed-by: Richard Ponton <richard.ponton@couchbase.com>
1 Like
@btburnett3
Yeah, thank you. I saw this, but first of all I want to have a deterministic way to reproduce this. Then I can check if it’s fixed in 3.3.2
Tried this with some simple scenarios (like breaking a node, breaking network etc), but hasn’t succeeded yet.
From the notes, it appears it can happen in a mixed state cluster (some nodes on 6.5 and some on 7.0) where CCCP fails so it falls back to HTTP bootstrap. I guess there may be other scenarios where it falls back to HTTP bootstrap and has the same issue.
@jmorris Do you have any ideas?
@btburnett3
This is reproduced consistently after doing a graceful failover of a node. Will try the same steps with 3.3.2 and get back with the results.
@eugene-shcherbo @btburnett3 Interesting to see if this is resolved in 3.3.2, if not we can create a ticket and get it resolved.
Jeff
1 Like
@jmorris @btburnett3 Looks like it has been resolved in 3.3.2, thank you guys.
Seems like I can’t update the original post any more, so will add more info about the cluster state in this reply:
The state of the cluster was the following:
we had a failed node with installed Couchbase server
Failover operation occurred and the node was taken out of the load
The node was in the DNS SRV records (this was a root cause of the problem)