Hi guys,
We’ve been dealing with an issue where calls to GetAsync would fail for certain documents repeatedly. After some digging around, I was able to find the exact situation that causes it. There seems to be a bug in Utf8MemoryReader when calling Decoder.Convert in .NET Framework 4.7.2 using Couchbase .NET SDK 3.4.12.
If the last character read to the output buffer is a high surrogate half when the buffer is full, it results in the output buffer not being expanded properly and the next call to Utf8MemoryReader.Read for a single character will cause an exception.
Exception: The output char buffer is too small to contain the decoded characters, encoding ‘Unicode (UTF-8)’ fallback ‘System.Text.DecoderReplacementFallback’.
Example Code:
using Couchbase;
using Newtonsoft.Json;
using System;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApp4
{
internal class Program
{
static void Main(string[] args)
{
var couchbaseConnection = "couchbase://localhost";
var couchbaseUser = "<User>";
var couchbasePassword = "<Password>";
var couchbaseBucket = "<Bucket>";
var keyName = "test_key";
var sb = new StringBuilder();
sb.AppendLine("{\"Text\": \"");
for (var i = 0; i < 1005; i++)
{
// Fill with a bunch of a's
sb.Append("a");
}
// Add surrogate characters towards the end of the string.
//
// Hex code D83E must show up in the character position 1022 in the output buffer. (Output buffer length 1024)
// This causes Utf8MemoryReader's _decoder.Convert to return a character read length of 1022 instead of 1023 since D83E is a high surrogate half.
//
// Newtonsoft.Json will attempt to read the last character, but it won't resize the output buffer before making another Utf8MemoryReader.Read call.
//
// This results in the following exception:
// The output char buffer is too small to contain the decoded characters, encoding 'Unicode (UTF-8)' fallback 'System.Text.DecoderReplacementFallback'.
sb.Append("\ud83e\udd3a\ud83e\udd3a\ud83e\udd3a\ud83e\udd3a\ud83e\udd3a\ud83e\udd3a\ud83e\udd3a\" }");
var testItem = JsonConvert.DeserializeObject<TestPayload>(sb.ToString());
Task.Run(async () => {
var cluster = await Cluster.ConnectAsync(couchbaseConnection, couchbaseUser, couchbasePassword);
await cluster.WaitUntilReadyAsync(TimeSpan.FromSeconds(60));
var bucket = await cluster.BucketAsync(couchbaseBucket).ConfigureAwait(false);
var defaultCollection = await bucket.DefaultCollectionAsync();
try
{
await defaultCollection.InsertAsync(keyName, testItem);
// This will throw an error.
var result = await defaultCollection.GetAsync(keyName);
var payload = result.ContentAs<TestPayload>();
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
// delete the key
await defaultCollection.RemoveAsync(keyName);
await cluster.DisposeAsync();
}).Wait();
Console.ReadLine();
}
public class TestPayload
{
public string Text { get; set; }
}
}
}