Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Instrument Lambda invocations in AWS SDK #2784

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions .github/workflows/all_solutions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,7 @@ jobs:
AwsLambda.Sns,
AwsLambda.Sqs,
AwsLambda.WebRequest,
AwsSdk,
AzureFunction,
BasicInstrumentation,
CatInbound,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,7 @@ public interface IAttributeDefinitions

AttributeDefinition<object, object> GetLambdaAttribute(string name);
AttributeDefinition<object, object> GetFaasAttribute(string name);
AttributeDefinition<object, object> GetCloudSdkAttribute(string name);

AttributeDefinition<string, string> GetRequestParameterAttribute(string paramName);

Expand Down Expand Up @@ -190,6 +191,7 @@ public AttributeDefinitions(IAttributeFilter attribFilter)
private readonly ConcurrentDictionary<string, AttributeDefinition<string, string>> _requestHeadersAttributes = new ConcurrentDictionary<string, AttributeDefinition<string, string>>();
private readonly ConcurrentDictionary<string, AttributeDefinition<object, object>> _lambdaAttributes = new ConcurrentDictionary<string, AttributeDefinition<object, object>>();
private readonly ConcurrentDictionary<string, AttributeDefinition<object, object>> _faasAttributes = new();
private readonly ConcurrentDictionary<string, AttributeDefinition<object, object>> _cloudSdkAttributes = new();

private readonly ConcurrentDictionary<TypeAttributeValue, AttributeDefinition<TypeAttributeValue, string>> _typeAttributes = new ConcurrentDictionary<TypeAttributeValue, AttributeDefinition<TypeAttributeValue, string>>();

Expand Down Expand Up @@ -281,6 +283,19 @@ public AttributeDefinition<object, object> GetFaasAttribute(string name)
}


private AttributeDefinition<object, object> CreateCloudSdkAttribute(string attribName)
{
return AttributeDefinitionBuilder
.Create<object, object>(attribName, AttributeClassification.AgentAttributes)
.AppliesTo(AttributeDestinations.TransactionTrace)
.AppliesTo(AttributeDestinations.SpanEvent)
.Build(_attribFilter);
}

public AttributeDefinition<object, object> GetCloudSdkAttribute(string name)
{
return _cloudSdkAttributes.GetOrAdd(name, CreateCloudSdkAttribute);
}
public AttributeDefinition<object, object> GetCustomAttributeForTransaction(string name)
{
return _trxCustomAttributes.GetOrAdd(name, CreateCustomAttributeForTransaction);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -331,5 +331,10 @@ public void AddFaasAttribute(string name, object value)
{
return;
}

public void AddCloudSdkAttribute(string name, object value)
{
return;
}
}
}
12 changes: 12 additions & 0 deletions src/Agent/NewRelic/Agent/Core/Transactions/Transaction.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1393,5 +1393,17 @@ public void AddFaasAttribute(string name, object value)
var faasAttrib = _attribDefs.GetFaasAttribute(name);
TransactionMetadata.UserAndRequestAttributes.TrySetValue(faasAttrib, value);
}

public void AddCloudSdkAttribute(string name, object value)
{
if (string.IsNullOrWhiteSpace(name))
{
Log.Debug($"AddCloudSdkAttribute - Unable to set Cloud value on transaction because the key is null/empty");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say name instead of key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the pattern in AddLambdaAttribute and AddFaasAttribute but yeah, "name" is probably clearer.

return;
}

var cloudAttrib = _attribDefs.GetCloudSdkAttribute(name);
TransactionMetadata.UserAndRequestAttributes.TrySetValue(cloudAttrib, value);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -311,5 +311,7 @@ ISegment StartMessageBrokerSegment(MethodCall methodCall, MessageBrokerDestinati
void AddLambdaAttribute(string name, object value);

void AddFaasAttribute(string name, object value);

void AddCloudSdkAttribute(string name, object value);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
// Copyright 2020 New Relic, Inc. All rights reserved.
// SPDX-License-Identifier: Apache-2.0

using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using NewRelic.Agent.Api;

namespace NewRelic.Agent.Extensions.Helpers
{
public static class AwsSdkHelpers
{
private static Regex RegionRegex = new Regex(@"^[a-z]{2}((-gov)|(-iso([a-z]?)))?-[a-z]+-\d{1}$", RegexOptions.Compiled);
private static bool LooksLikeARegion(string text) => RegionRegex.IsMatch(text);
private static bool LooksLikeAnAccountId(string text) => (text.Length == 12) && text.All(c => c >= '0' && c <= '9');
// Only log ARNs we can't parse once
private static HashSet<string> BadInvocations = new HashSet<string>();

// This is the full regex pattern for an ARN:
// (arn:(aws[a-zA-Z-]*)?:lambda:)?([a-z]{2}((-gov)|(-iso([a-z]?)))?-[a-z]+-\d{1}:)?(\d{12}:)?(function:)?([a-zA-Z0-9-_\.]+)(:(\$LATEST|[a-zA-Z0-9-_]+))?

// If it's a full ARN, it has to start with 'arn:'
// A partial ARN can contain up to 5 segments separated by ':'
// 1. Region
// 2. Account ID
// 3. 'function' (fixed string)
// 4. Function name
// 5. Alias or version
// Only the function name is required, the rest are all optional. e.g. you could have region and function name and nothing else
public static string ConstructArn(IAgent agent, string invocationName, string region, string accountId)
{
if (invocationName.StartsWith("arn:"))
{
return invocationName;
}
var segments = invocationName.Split(':');
string functionName = null;
string alias = null;
string fallback = null;

foreach (var segment in segments)
{
if (LooksLikeARegion(segment))
{
if (string.IsNullOrEmpty(region))
{
region = segment;
}
else
{
fallback = segment;
}
continue;
}
else if (LooksLikeAnAccountId(segment))
{
if (string.IsNullOrEmpty(accountId))
{
accountId = segment;
}
else
{
fallback = segment;
}
continue;
}
else if (segment == "function")
{
continue;
}
else if (functionName == null)
{
functionName = segment;
}
else if (alias == null)
{
alias = segment;
}
else
{
if (BadInvocations.Add(invocationName))
{
agent?.Logger.Debug($"Unable to parse function name '{invocationName}'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It seems like it may be possible to run this code concurrently (two async lambda invocations), and we are not using a thread safe collection. This might cause some problems.
  2. Since we do not log this at the default log level, we may want to perform a is log level enabled check before storing that invocationName indefinitely.
  3. Do we need to put a cap on this collection so that it doesn't grow too large? Should we just log the first N times we encounter this problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment applies to the other places where we use this collection and log a message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would supportability metrics be useful for debugging these types of problems?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Ideally we're never going to hit this, but that is technically possible. I can make it thread safe.
  2. Sure
  3. Since it's a Set, they'd have to be invoking a significant number of unique Lambda names, all of which are un-parse-able. That seems pretty unlikely, but a size check is easy enough.
  4. I've replaced all the logging calls with a helper method.
  5. I don't think so. This is more for troubleshooting individual cases of "my Lambda calls aren't linking". If we saw a large number of parsing failures, we'd still need to see some concrete examples in order to fix it. AWS provides a regex for what's a valid function identifier, and I've tried to cover all the possibilities in the unit tests, so I'm not expecting many surprises.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't think this is likely, and we just need to capture a few examples, maybe we don't need a collection at all and just log the first 10 times we get something we can't parse?

}
return null;
}
}

if (string.IsNullOrEmpty(functionName))
{
if (!string.IsNullOrEmpty(fallback))
{
functionName = fallback;
}
else
{
if (BadInvocations.Add(invocationName))
{
agent?.Logger.Debug($"Unable to parse function name '{invocationName}'");
}
return null;
}
}

if (string.IsNullOrEmpty(accountId))
{
if (BadInvocations.Add(invocationName))
{
agent?.Logger.Debug($"Need account ID in order to resolve function '{invocationName}'");
}
return null;
}
if (string.IsNullOrEmpty(region))
{
if (BadInvocations.Add(invocationName))
{
agent?.Logger.Debug($"Need region in order to resolve function '{invocationName}'");
}
return null;
}
if (!string.IsNullOrEmpty(alias))
{
return $"arn:aws:lambda:{region}:{accountId}:function:{functionName}:{alias}";
}
return $"arn:aws:lambda:{region}:{accountId}:function:{functionName}";
}
}
}
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
// Copyright 2020 New Relic, Inc. All rights reserved.
// SPDX-License-Identifier: Apache-2.0

using System;
using System.Collections.Generic;
using System.Threading.Tasks;
using NewRelic.Agent.Api;
using NewRelic.Agent.Extensions.Providers.Wrapper;

Expand All @@ -19,6 +21,23 @@ public CanWrapResponse CanWrap(InstrumentedMethodInfo methodInfo)
return new CanWrapResponse(WrapperName.Equals(methodInfo.RequestedWrapperName));
}

private string GetRegion(IAgent agent, dynamic requestContext)
{
try
{
var clientconfig = requestContext.ClientConfig;
var regionEndpoint = clientconfig.RegionEndpoint;
var systemName = regionEndpoint.SystemName;
return systemName;
}
catch (Exception e)
{
agent.Logger.Debug(e, $"AwsSdkPipelineWrapper: Unable to get region from requestContext.");
}

return "";
}

public AfterWrappedMethodDelegate BeforeWrappedMethod(InstrumentedMethodCall instrumentedMethodCall, IAgent agent, ITransaction transaction)
{
// Get the IExecutionContext (the only parameter)
Expand Down Expand Up @@ -54,6 +73,10 @@ public AfterWrappedMethodDelegate BeforeWrappedMethod(InstrumentedMethodCall ins
{
return SQSRequestHandler.HandleSQSRequest(instrumentedMethodCall, agent, transaction, request, isAsync, executionContext);
}
else if (requestType == "Amazon.Lambda.Model.InvokeRequest")
{
return LambdaInvokeRequestHandler.HandleInvokeRequest(instrumentedMethodCall, agent, transaction, request, isAsync, GetRegion(agent, requestContext));
}

if (_unsupportedRequestTypes.Add(requestType)) // log once per unsupported request type
agent.Logger.Debug($"AwsSdkPipelineWrapper: Unsupported request type: {requestType}. Returning NoOp delegate.");
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
// Copyright 2020 New Relic, Inc. All rights reserved.
// SPDX-License-Identifier: Apache-2.0

using System.Collections.Concurrent;
using System;
using System.Threading.Tasks;
using NewRelic.Agent.Api;
using NewRelic.Agent.Api.Experimental;
using NewRelic.Agent.Extensions.Providers.Wrapper;
using NewRelic.Reflection;
using NewRelic.Agent.Extensions.Helpers;

namespace NewRelic.Providers.Wrapper.AwsSdk
{
internal static class LambdaInvokeRequestHandler
{
private static ConcurrentDictionary<Type, Func<object, object>> _getResultFromGenericTask = new();
private static ConcurrentDictionary<string, string> _arnCache = new ConcurrentDictionary<string, string>();

private static object GetTaskResult(object task)
{
if (((Task)task).IsFaulted)
{
return null;
}

var getResponse = _getResultFromGenericTask.GetOrAdd(task.GetType(), t => VisibilityBypasser.Instance.GeneratePropertyAccessor<object>(t, "Result"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are using dynamic anyways, do you need to use the concurrent dictionary and visibility bypasser combination here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the pattern we use elsewhere, though now that I look at it I'm not sure why. I can simplify it.

return getResponse(task);
}

private static void SetRequestIdIfAvailable(IAgent agent, ITransaction transaction, dynamic response)
{
try
{
dynamic metadata = response.ResponseMetadata;
nrcventura marked this conversation as resolved.
Show resolved Hide resolved
string requestId = metadata.RequestId;
transaction.AddCloudSdkAttribute("aws.requestId", requestId);
}
catch (Exception e)
{
agent.Logger.Debug(e, "Unable to get RequestId from response metadata.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that this could be logged too frequently at the Debug level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible, though we don't usually limit error logging in our wrappers. I can make it a one-time thing.

}
}

public static AfterWrappedMethodDelegate HandleInvokeRequest(InstrumentedMethodCall instrumentedMethodCall, IAgent agent, ITransaction transaction, dynamic request, bool isAsync, string region)
{
string functionName = request.FunctionName;
string qualifier = request.Qualifier;
if (!string.IsNullOrEmpty(qualifier) && !functionName.EndsWith(qualifier))
{
functionName = $"{functionName}:{qualifier}";
}
string arn;
if (!_arnCache.TryGetValue(functionName, out arn))
nrcventura marked this conversation as resolved.
Show resolved Hide resolved
{
arn = AwsSdkHelpers.ConstructArn(agent, functionName, region, "");
_arnCache.TryAdd(functionName, arn);
}
var segment = transaction.StartTransactionSegment(instrumentedMethodCall.MethodCall, "InvokeRequest");
nrcventura marked this conversation as resolved.
Show resolved Hide resolved
segment.GetExperimentalApi().MakeLeaf();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will suppress the HttpClient instrumentation, so we will no longer get the distributed tracing headers added, or any of the other external call attributes that were previously collected for these calls (prior to this instrumentation).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, and the expected behavior. If we leave in the HttpClient segment, an additional Entity is created that can't be linked to the Lambda itself, because the URI is not unique enough. I take your point about the missing attributes, though, and will double check with the other devs on this initiative..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and we need to ensure that we are still generating the expected metrics so that the externals UI will work, and the span.kind is correct for the span that is ultimately generated.


transaction.AddCloudSdkAttribute("cloud.platform", "aws_lambda");
transaction.AddCloudSdkAttribute("aws.operation", "InvokeRequest");
transaction.AddCloudSdkAttribute("aws.region", region);


if (!string.IsNullOrEmpty(arn))
{
transaction.AddCloudSdkAttribute("cloud.resource_id", arn);
}

if (isAsync)
{
return Delegates.GetAsyncDelegateFor<Task>(agent, segment, true, InvokeTryProcessResponse, TaskContinuationOptions.ExecuteSynchronously);

void InvokeTryProcessResponse(Task responseTask)
{
try
{
if (responseTask.Status == TaskStatus.Faulted)
{
transaction.NoticeError(responseTask.Exception);
}
SetRequestIdIfAvailable(agent, transaction, GetTaskResult(responseTask));
}
finally
{
segment?.End();
}
}
}
else
{
return Delegates.GetDelegateFor<object>(
onFailure: ex => segment.End(ex),
onSuccess: response =>
{
SetRequestIdIfAvailable(agent, transaction, response);
segment.End();
}
);
}
}
}
}
Loading
Loading