Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Instrument Lambda invocations in AWS SDK #2784

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

chynesNR
Copy link
Member

@chynesNR chynesNR commented Sep 27, 2024

Instrument calls to Lambdas and Lambda Aliases from the AWS SDK, with span attributes to allow linking the calls to the actual Lambda (specifically the ARN is what's needed). Includes unit and integration tests. We are able to test without a making a successful call to a lambda, so no live testing resources are necessary.

Note that we are deliberately not creating an External segment, as that would result in an extra entity being created.

@chynesNR chynesNR marked this pull request as ready for review October 15, 2024 15:18
@chynesNR chynesNR requested a review from a team as a code owner October 15, 2024 15:18
{
if (string.IsNullOrWhiteSpace(name))
{
Log.Debug($"AddCloudSdkAttribute - Unable to set Cloud value on transaction because the key is null/empty");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say name instead of key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the pattern in AddLambdaAttribute and AddFaasAttribute but yeah, "name" is probably clearer.

Comment on lines 81 to 83
if (BadInvocations.Add(invocationName))
{
agent?.Logger.Debug($"Unable to parse function name '{invocationName}'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It seems like it may be possible to run this code concurrently (two async lambda invocations), and we are not using a thread safe collection. This might cause some problems.
  2. Since we do not log this at the default log level, we may want to perform a is log level enabled check before storing that invocationName indefinitely.
  3. Do we need to put a cap on this collection so that it doesn't grow too large? Should we just log the first N times we encounter this problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment applies to the other places where we use this collection and log a message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would supportability metrics be useful for debugging these types of problems?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Ideally we're never going to hit this, but that is technically possible. I can make it thread safe.
  2. Sure
  3. Since it's a Set, they'd have to be invoking a significant number of unique Lambda names, all of which are un-parse-able. That seems pretty unlikely, but a size check is easy enough.
  4. I've replaced all the logging calls with a helper method.
  5. I don't think so. This is more for troubleshooting individual cases of "my Lambda calls aren't linking". If we saw a large number of parsing failures, we'd still need to see some concrete examples in order to fix it. AWS provides a regex for what's a valid function identifier, and I've tried to cover all the possibilities in the unit tests, so I'm not expecting many surprises.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't think this is likely, and we just need to capture a few examples, maybe we don't need a collection at all and just log the first 10 times we get something we can't parse?

_arnCache.TryAdd(functionName, arn);
}
var segment = transaction.StartTransactionSegment(instrumentedMethodCall.MethodCall, "InvokeRequest");
segment.GetExperimentalApi().MakeLeaf();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will suppress the HttpClient instrumentation, so we will no longer get the distributed tracing headers added, or any of the other external call attributes that were previously collected for these calls (prior to this instrumentation).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, and the expected behavior. If we leave in the HttpClient segment, an additional Entity is created that can't be linked to the Lambda itself, because the URI is not unique enough. I take your point about the missing attributes, though, and will double check with the other devs on this initiative..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and we need to ensure that we are still generating the expected metrics so that the externals UI will work, and the span.kind is correct for the span that is ultimately generated.

}
catch (Exception e)
{
agent.Logger.Debug(e, "Unable to get RequestId from response metadata.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that this could be logged too frequently at the Debug level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible, though we don't usually limit error logging in our wrappers. I can make it a one-time thing.

return null;
}

var getResponse = _getResultFromGenericTask.GetOrAdd(task.GetType(), t => VisibilityBypasser.Instance.GeneratePropertyAccessor<object>(t, "Result"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are using dynamic anyways, do you need to use the concurrent dictionary and visibility bypasser combination here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the pattern we use elsewhere, though now that I look at it I'm not sure why. I can simplify it.

@tippmar-nr
Copy link
Member

Looks like we might want to review the CodeCov report - patch coverage is a bit low...

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 66.34615% with 35 lines in your changes missing coverage. Please review.

Project coverage is 81.27%. Comparing base (27a78cb) to head (28a5ebe).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...nsions/NewRelic.Agent.Extensions/Helpers/AwsSdk.cs 84.61% 9 Missing and 3 partials ⚠️
...gent/Core/Attributes/AttributeDefinitionService.cs 9.09% 10 Missing ⚠️
...nt/NewRelic/Agent/Core/Transactions/Transaction.cs 20.00% 8 Missing ⚠️
....Agent.Extensions/Collections/ConcurrentHashSet.cs 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2784      +/-   ##
==========================================
- Coverage   81.31%   81.27%   -0.05%     
==========================================
  Files         460      461       +1     
  Lines       29239    29358     +119     
  Branches     3231     3252      +21     
==========================================
+ Hits        23777    23861      +84     
- Misses       4669     4701      +32     
- Partials      793      796       +3     
Flag Coverage Δ
Agent 82.16% <66.34%> (-0.06%) ⬇️
Profiler 73.33% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
....Agent.Extensions/Collections/ConcurrentHashSet.cs 30.00% <0.00%> (-2.73%) ⬇️
...nt/NewRelic/Agent/Core/Transactions/Transaction.cs 79.92% <20.00%> (-0.81%) ⬇️
...gent/Core/Attributes/AttributeDefinitionService.cs 94.36% <9.09%> (-1.28%) ⬇️
...nsions/NewRelic.Agent.Extensions/Helpers/AwsSdk.cs 84.61% <84.61%> (ø)

... and 1 file with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants