Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in EnhancedHttpRetryHelper #5905

Merged
merged 1 commit into from
Jul 5, 2024

Conversation

akoeplinger
Copy link
Contributor

@akoeplinger akoeplinger commented Jul 4, 2024

Bug

Fixes: NuGet/Home#13545

Regression? Last working version: 9.0-preview5

Description

When multiple threads try to access the RetryCount property we sometimes hit a case where RetryCount returned 0 (*).
This caused the loop in ProcessHttpSourceResultAsync() to be skipped completely because retry <= maxRetries:

int maxRetries = _enhancedHttpRetryHelper.IsEnabled ? _enhancedHttpRetryHelper.RetryCount : 3;
for (var retry = 1; retry <= maxRetries; ++retry)
{
var httpSourceCacheContext = HttpSourceCacheContext.Create(cacheContext, isFirstAttempt: retry == 1);
try
{
return await _httpSource.GetAsync(
new HttpSourceCachedRequest(
url,
"nupkg_" + identity.Id.ToLowerInvariant() + "." + identity.Version.ToNormalizedString(),
httpSourceCacheContext)
{
EnsureValidContents = stream => HttpStreamValidation.ValidateNupkg(url, stream),
IgnoreNotFounds = true,
MaxTries = 1,
IsRetry = retry > 1,
IsLastAttempt = retry == maxRetries
},
async httpSourceResult => await processAsync(httpSourceResult),
logger,
token);
}
catch (TaskCanceledException) when (retry < maxRetries)
{
// Requests can get cancelled if we got the data from elsewhere, no reason to warn.
var message = string.Format(CultureInfo.CurrentCulture, Strings.Log_CanceledNupkgDownload, url);
logger.LogMinimal(message);
}
catch (Exception ex) when (retry < maxRetries)
{
var message = string.Format(
CultureInfo.CurrentCulture,
Strings.Log_FailedToDownloadPackage,
identity,
url)
+ Environment.NewLine
+ ExceptionUtilities.DisplayMessage(ex);
logger.LogMinimal(message);
if (_enhancedHttpRetryHelper.IsEnabled &&
ex.InnerException != null &&
ex.InnerException is IOException &&
ex.InnerException.InnerException != null &&
ex.InnerException.InnerException is System.Net.Sockets.SocketException)
{
// An IO Exception with inner SocketException indicates server hangup ("Connection reset by peer").
// Azure DevOps feeds sporadically do this due to mandatory connection cycling.
// Delaying gives Azure more of a chance to recover.
logger.LogVerbose("Enhanced retry: Encountered SocketException, delaying between tries to allow recovery");
await Task.Delay(TimeSpan.FromMilliseconds(_enhancedHttpRetryHelper.DelayInMilliseconds), token);
}
}
catch (Exception ex) when (retry == maxRetries)
{
var message = string.Format(
CultureInfo.CurrentCulture,
Strings.Log_FailedToDownloadPackage,
identity,
url)
+ Environment.NewLine
+ ExceptionUtilities.DisplayMessage(ex);
logger.LogError(message);
}
}
return await processAsync(null);

Fix this by using Lazy<T> for initializing the fields which is thread-safe.

(*) The reason is that code like int RetryCount => _retryCount ??= 6; gets turned into:

int valueOrDefault = _retryCount.GetValueOrDefault();
if (!_retryCount.HasValue)
{
    valueOrDefault = 6;
    _retryCount = valueOrDefault;
    return valueOrDefault;
}
return valueOrDefault;

Suppose Thread A arrives first and calls GetValueOrDefault() (which is 0 for int) but then Thread B interjects and sets the value, now when Thread A resumes _retryCount.HasValue is true so we skip the if block and return valueOrDefault i.e. 0.

PR Checklist

  • PR has a meaningful title

  • PR has a linked issue.

  • Described changes

  • Tests

    • Automated tests added
    • OR
    • Test exception
    • OR
    • N/A
  • Documentation

    • Documentation PR or issue filled
    • OR
    • N/A

@akoeplinger akoeplinger requested a review from a team as a code owner July 4, 2024 11:01
@dotnet-policy-service dotnet-policy-service bot added the Community PRs created by someone not in the NuGet team label Jul 4, 2024
When multiple threads try to access the `RetryCount` property we sometimes hit a case where `RetryCount` returned 0 (*).
This caused the loop in `ProcessHttpSourceResultAsync()` to be skipped completely because `retry <= maxRetries`: https://github.com/NuGet/NuGet.Client/blob/6c642c2d63717acd4bf92050a42691928020eb89/src/NuGet.Core/NuGet.Protocol/Utility/FindPackagesByIdNupkgDownloader.cs#L258-L328

Fix this by using `Lazy<T>` for initializing the fields which is thread-safe.

Fixes NuGet/Home#13545

(*) The reason is that code like `int RetryCount => _retryCount ??= 6;` gets turned into:

```csharp
int valueOrDefault = _retryCount.GetValueOrDefault();
if (!_retryCount.HasValue)
{
    valueOrDefault = 6;
    _retryCount = valueOrDefault;
    return valueOrDefault;
}
return valueOrDefault;
```

Suppose Thread A arrives first and calls `GetValueOrDefault()` (which is 0 for int) but then Thread B interjects and sets the value, now when Thread A resumes `_retryCount.HasValue` is true so we skip the if block and return valueOrDefault i.e. 0.
Copy link
Contributor

@donnie-msft donnie-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I'm running our tests pipeline now...

Copy link
Member

@nkolev92 nkolev92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤝

_delayInMilliseconds = new Lazy<int>(() => GetIntFromEnvironmentVariable(DelayInMillisecondsEnvironmentVariableName, defaultValue: DefaultDelayMilliseconds, _environmentVariableReader));
_retry429 = new Lazy<bool>(() => GetBoolFromEnvironmentVariable(Retry429EnvironmentVariableName, defaultValue: true, _environmentVariableReader));
_observeRetryAfter = new Lazy<bool>(() => GetBoolFromEnvironmentVariable(ObserveRetryAfterEnvironmentVariableName, defaultValue: DefaultObserveRetryAfter, _environmentVariableReader));
_maxRetyAfterDelay = new Lazy<TimeSpan>(() =>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_maxRetyAfterDelay -> _maxRetryAfterDelay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks: #6025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community PRs created by someone not in the NuGet team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RestoreTask randomly fails after upgrading to latest version
4 participants