Encode Duration is tracked twice for the Threshold Tracing

QUICK SUMMARY

When logging slow operations Couchbase v3 exposes the duration of the encoding the operation in microseconds, it’s called encode_duration_us. I noticed that it has the larger value than expected. Investigating the code seems like a request span for the operation is Disposed twice which means that it adds this tag request_encoding_duration to the underlying activity object two times and to the parent activity 3 times.

DETAILS

Let’s consider an example with a get request. Get operation tracing is represented as a outer request span object. Then in the OperationBase.SendAsync method when encoding of the operation happens this outer span creates a child span for the encoding operation.

Task SendAsync(...) {
   ... 
   using var encodingSpan = Span.EncodingSpan();
  // encoding logic
  encodingSpan.Dispose(); // Dispose 1
  // send operation logic
 // Dispose 2
}

Every time the Dispose method is called on the span object, RequestSpan.End method is called which sets to the underlying activity this tag request_encoding_duration and then it moves all its tags to the parent activity:

void End() {
   ...
   case InnerRequestSpans.EncodingSpan.Name:
   {
       var durationStr = SetEndTimeAndDuration();
       SetAttribute(ThresholdTags.EncodeDuration, durationStr);
       ...
    }
    ...
    if (parentSpan != null)
    {
       foreach (var activityTag in _activity.Tags)
       {
           parentSpan.SetAttribute(activityTag.Key, activityTag.Value!);
       }
     }
}

So this means that when we call Dispose twice, we call End twice and we add that tag two times to the encoding activity object and three times (1 tag on first call + 2 tags on second call) to the parent activity object.

Then the ThresholdListener collects the total encoding value as a sum of these tags which 1) have two duplicated values and 2) one tag contains encoding + send time

Currently, from the client perspective this can be fixed by providing custom RequestTracer and RequestSpan classes to the SDK (thankfully it allows doing this).

Question: can some tag names for traceability be publicly available instead of internally? Tags like these:

  • InnerRequestSpans.DispatchSpan.Attributes.TimeoutMilliseconds
  • InnerRequestSpans.DispatchSpan.Attributes.ServerDuration
  • OuterRequestSpans.Attributes.Operation
  • etc

@eugene-shcherbo

Thanks for the detailed report! I created a ticket for fixing based on what you reported here.

I will look into it, but imagine that they could be public. What is the use-case?

Jeff

1 Like

Hi @jmorris , thank you for the reply and taking actions on this.

Regarding to the tag names visibility and use case - I want to expose these data via prometheus as a custom metric showing the threshold requests and their durations, something like this:

cb_slow_requests{
   duration=(total|encode|last_server|total_server|...), 
   operation=(upsert|get|...)}

So I need to parse the activity tags for these values to rearrange them in a custom way. I’m still experimenting with it, not sure how useful that prometheus metric will be, will check.

1 Like

@eugene-shcherbo

I pushed a commit to our code review system Gerrit.

I believe this should fix the issue; feel free to leave a comment there.

Jeff

1 Like

Thanks @jmorris ,

I agree with the change, but I also think it makes sense to fix the OperationBase.SendAsync method, so that it doesn’t call Dispose twice. What do you think?

Yes, that sounds reasonably.

@eugene-shcherbo

Looking deeper into this, there is no longer any side effects if dispose is called twice; however, changing the code to only call it once changes the encoding timings (potentially as we need to rearrange code). If we remove the using and only call Dispose explicitly it’s possible that its never called (an exception is thrown, for example) which would have potential side effects.

The other option is to call End instead of Dispose, but were getting the same behavior as just calling Dispose.

Jeff

1 Like

Hello @jmorris

Gotcha, completely agree with this then. Thank you for looking into this :slight_smile:

1 Like