Streaming result of queries

Hi,

Few months ago, I raised a ticket about a crash in version 3.4.8: https://www.couchbase.com/forums/t/crash-with-3-4-8-nuget-package/36553$

The fix was to remove HttpCompletionOption.ResponseHeadersRead in .net 4.8 https://review.couchbase.org/c/couchbase-net-client/+/193873

But now, the side effect is that queries runs in synchronous mode: Async queries runs synchronous in .net 4.8 · Issue #369 · couchbaselabs/Linq2Couchbase · GitHub

This is annoying, so I digged and found the root cause of the crash in 3.4.8 and I have a fix

In QueryClient.ExecuteQuery we declare the HttClient as is:

using var httpClient = CreateHttpClient(options.TimeoutValue);

This means, once we leave the ExecuteQuery method, the http client will be disposed.

But before we leave the ExecuteQuery method, we keep an instance of the stream in result class

if (serializer is IStreamingTypeDeserializer streamingDeserializer)
{
    queryResult = new StreamingQueryResult<T>(stream, streamingDeserializer, ErrorContextFactory);
 }
else
{
    queryResult = new BlockQueryResult<T>(stream, serializer);
}

And the stream will be reused later for each enumerator loop

The side effect in .net 4.8, when we dispose the http client, it disposes also the stream

The fix is to not dispose the httclient in the ExecuteQuery method, but pass it to the result and let the result dipsose the client

In QueryClient.ExecuteQuery, declare the http client as is (remove the using):

var httpClient = CreateHttpClient(options.TimeoutValue);

In QueryClient.ExecuteQuery, pass the http client to the result

if (serializer is IStreamingTypeDeserializer streamingDeserializer)
{
    queryResult = new StreamingQueryResult<T>(httpClient, stream, streamingDeserializer, ErrorContextFactory);
}
else
{
    queryResult = new BlockQueryResult<T>(httpClient, stream, serializer);
}

in QueryResultBase, store the reference of the client


        private readonly HttpClient _httpClient;

        /// <summary>
        /// Creates a new QueryResultBase.
        /// </summary>
        /// <param name="httpClient">The <see cref="HttpClient"/> holding the request</param>;
        /// <param name="responseStream"><see cref="Stream"/> to read.</param>
        protected QueryResultBase(HttpClient httpClient, Stream responseStream)
        {
            _httpClient = httpClient;
            ResponseStream = responseStream ?? throw new ArgumentNullException(nameof(responseStream));
        }

in QueryResultBase.Dispose, dispose the client

        public virtual void Dispose()
        {
            ResponseStream?.Dispose();
            _httpClient?.Dispose();
        }

This is the same logic for search, analytics and views

I’m afraid that it may be a bit more complicated than this. While we do dispose of the HttpClient, we’re using a more advanced use pattern where we pass in a shared HttpMessageHandler and then tell it not to dispose that handler when the HttpClient is disposed.

This should mean, in theory, that the underlying connection isn’t closed when the client is disposed. There is some logic around pending requests here: https://github.com/microsoft/referencesource/blob/51cf7850defa8a17d815b4700b67116e3fa283c2/System/net/System/Net/Http/HttpClient.cs#L503 We’d need to dig into that a bit deeper to understand it I think.

That said, I think overall you may be onto something.

I’ve done some digging through the .NET source (both .NET 4 and 8) and this is what I’ve found. Note that this is based solely upon code review.

Both .NET 4 and 8 trigger a cancellation token when the HttpClient is disposed. This is meant to cancel pending requests. For example, if the request stream was still being sent to the server it would cancel that.

In .NET 4, the request is eventually handled by the legacy WebRequest infrastructure. This code registers a callback on the cancellation token passed to it which in turn closes the socket when it’s canceled. This could then cause the socket errors we were seeing if the dispose happened before the request body was read in full.

In .NET 8 we use the SocketsHttpHandler since the WebRequest infrastructure is deprecated. It appears that the cancellation token is unused past the point of sending the request body buried within the SocketsHttpHandler implementation, and the token being canceled while streaming the response body would have no effect. I also think it’s safe to conclude that .NET 6 behaves similarly.

All of this is to say that I believe @mdegroux theory is correct. If we delay disposing of the HttpClient until after the response body is fully streamed then we should avoid the socket exceptions we observed earlier.

However, there are other concerns as well, particularly around connection leaks. In .NET 4 using HttpCompletionOption.RequestHeadersRead absolutely requires that the underlying HttpResponseMessage or Stream be disposed when done. Failure to do so will “leak” the HTTP connection and not allow it’s reuse. When this is combined with the default .NET 4 behavior of limiting to 2 active HTTP connections per HTTP server this could have very detrimental effects. This would occur if the SDK consumer either doesn’t enumerate or otherwise dispose of the IQueryResult<T> instance that we return.

This is still an issue in .NET 6/8. However, it is mitigated in those versions for two reasons. First, the NET team added a finalizer so that GC will free the connection eventually. Second, the default is unlimited HTTP connections per server.

This means that making the change to .NET 4 is not without a risk of negatively impacting consumers. There may need to be some consideration given to making it opt-in for consumers who have ensured they are disposing correctly.

So in summary,

You will add something like
ClusterOptions.KeepHttpClientAliveAsLongResultIsNoDisosed (you get the ieda for the name :wink: )
false by default

I’m not sure what we’ll do, it will probably be a decision that @jmorris needs to make. It’s possible that the risk/reward benefits for older frameworks are not there. But if we do something I’d recommend an option like EnableHttpResponseStreaming that defaults to false on .NET 4 and true on .NET >= 6 combined with an across-the-board change to the HttpClient dispose behavior.

I think .net 4.8 should not be treated as an old framework

First it will be supported for many years by Microsoft
Then we (and not only us) rely on third party apps/components/… that can’t be migrated to .net 6+

It’s not so much treating it as an old framework as recognizing the customer base using it. Those who are using .NET 4 are much more likely to be very concerned about stability and therefore there is more reluctance to introduce changes that may increase instability. When this is combined with the different .NET 4 versions having subtlety different behaviors that are difficult to fully test for it gets a bit messy. There are just pros and cons to be weighed especially when discussing adding a feature, which is somewhat different than fixing a bug.

IIRC this is similar what we did in CB SDK 2.X, but the reason was to add streaming in backwards compatible way without changing the behavior for users. In that case the user had to opt in for streaming. I would say I am ok with that approach if it makes the SDK more stable for older frameworks. They can always “opt in” if they want to improve performance.

That being said, this could also break users if it’s changed as very long queries would not be streamed and instead be buffered which could lead to OOM exceptions (the whole reason for adding streaming). It’s a tradeoff here.

@jmorris

For the sake of clarity I want to elaborate on the current behavior:

.NET 4.x:

  • The HTTP response body is collected in a byte buffer and not returned until the entire body has been received. I’m currently unclear from a quick code review if this buffer may be spooled to disk or always lives on the heap
  • The deserialized objects when enumerating the IQueryResult are streamed (unless using an incompatible custom deserializer), but the first result is not available until after the entire body is received

.NET >=6:

  • The HTTP response body is returned as a stream after headers are read
  • The deserialized objects when enumerating the IQueryResult are streamed, the first result is available as soon as it is received over the wire (unless using an incompatible custom deserializer)

The majority of the OOM concerns are around the deserialized POCOs rather than the response body, IMO. What we are discussing changing is only related to the response body in .NET 4 and would improve things as relates to memory if enabled (so long as it isn’t misused and Dispose is called).

The “danger” of this plan is A) people enabling the setting thinking “streaming is good” and then being confused when their app has esoteric and hard-to-diagnose problems with connection pools because they didn’t dispose correctly and B) whatever other strange behaviors exist in .NET 4 that we don’t yet know about that may cause other problems.

1 Like

Hi,

Have you made a decision on what will be done or not?

@mdegroux -

I created a ticket for disabling streaming on older .NET versions by default in the SDK. From what I understand, this is the only real option we have here. If a user wants to stream, they will have to opt-in similar to the behavior in SDK2.

@jmorris

I think you misunderstand the goals. The current behavior in SDK 3 on .NET 4 is already identical to the SDK 2 behavior, which is to stream objects but not to stream bytes. I believe that @mdegroux wants us to support streaming both objects and bytes in .NET 4, like we do currently in .NET >=6. The bug in .NET 4 around streaming was already fixed previously by disabling byte streaming.

@mdegroux

@jmorris and I discussed this and decided it’s probably worth implementing support for full streaming in .NET 4. This means that the HTTP response body will now be streamed in addition to the existing behavior of streaming the deserialized POCOs. This will, however, be opt-in for .NET 4 consumers, though it will continue to be on by default for modern frameworks.

Here is the issue: Loading...

Also, if you’d like to test it here is the first draft implementation: https://review.couchbase.org/c/couchbase-net-client/+/198442

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.