Skip to content

Commit

Permalink
Merged PR 747091: Use full hostname for agents in ADO distributed builds
Browse files Browse the repository at this point in the history
Hostname resolution stopped working when removing the private subnet we were using to run distributed builds. It turns out that DNS resolution in the "default network" needs the hostnames to be qualified with a domain (https://learn.microsoft.com/en-us/azure/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances).

This PR adds an option (to be set by the AdoBuildRunner) so we can inject the correct hostnames in the build, without changing how this works for CloudBuild (where Dns.GetHostName works)

Related work items: #2116072
  • Loading branch information
marcelolynch authored and semihokur committed Oct 24, 2023
1 parent d934cd3 commit db078cf
Show file tree
Hide file tree
Showing 18 changed files with 88 additions and 54 deletions.
3 changes: 1 addition & 2 deletions .azdo/linux/job-selfhost.yml
Original file line number Diff line number Diff line change
Expand Up @@ -139,15 +139,14 @@ jobs:
# - we also disable early worker release to avoid releasing a worker before attachment, which tends to happen
# when the build is highly cached: the intention is to have as much of a distributed build as possible for validation purposes
# Set a 60m timeout so we can catch hangs *and* get logs collected at the same time. Otherwise the whole job will timeout (check 'timeoutInMinutes' above).
timeout --signal 9 60m ./bxl.sh --use-dev --use-adobuildrunner ${{ parameters.BxlCommonArgs }} /logsDirectory:"Out/Logs/${{ parameters.validationName }}" ${{ parameters.bxlExtraArgs }} "/f:tag='test'" /earlyWorkerRelease- /p:BuildXLWorkerAttachTimeoutMin=10 /logToKusto /logToKustoBlobUri:https://adomessages.blob.core.windows.net/adomessages /logToKustoIdentityId:6e0959cf-a9ba-4988-bbf1-7facd9deda51 /logToKustoTenantId:975f013f-7f24-47e8-a7d3-abc4752bf346 /historicMetadataCache-
timeout --signal 9 60m ./bxl.sh --use-dev --use-adobuildrunner ${{ parameters.BxlCommonArgs }} /logsDirectory:"Out/Logs/${{ parameters.validationName }}" ${{ parameters.bxlExtraArgs }} "/f:tag='test'" /earlyWorkerRelease- /p:BuildXLWorkerAttachTimeoutMin=5 /logToKusto /logToKustoBlobUri:https://adomessages.blob.core.windows.net/adomessages /logToKustoIdentityId:6e0959cf-a9ba-4988-bbf1-7facd9deda51 /logToKustoTenantId:975f013f-7f24-47e8-a7d3-abc4752bf346 /historicMetadataCache- /p:BuildXLGrpcVerbosityEnabled=1 /p:BuildXLGrpcVerbosityLevel=1 /dynamicBuildWorkerSlots:1
displayName: Test (${{ parameters.validationName }})
workingDirectory: /home/subst
env:
PAT1esSharedAssets: $(PAT-TseBuild-AzureDevOps-1esSharedAssets-Package-Read)
PATCloudBuild: $(PAT-TseBuild-AzureDevOps-CloudBuild-Packaging-Read)
VSTSPERSONALACCESSTOKEN: $(PAT-TseBuild-AzureDevOps-mseng-buildcache)
SYSTEM_ACCESSTOKEN: $(System.AccessToken)
AdoBuildRunnerWaitForOrchestratorExit: true
AdoBuildRunnerInvocationKey: LinuxSelfhostValidation_${{ parameters.validationName }}
- task: PublishTestResults@2
Expand Down
2 changes: 1 addition & 1 deletion .azdo/linux/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ extends:
# Build and test selfhost with BuildXL
- template: /.azdo/linux/job-selfhost.yml@self
parameters:
Distributed: false
Distributed: true
ValidationName: InternalRelease
BxlExtraArgs: --internal /q:ReleaseLinux /forceAddExecutionPermission-

Expand Down
1 change: 1 addition & 0 deletions Documentation/Wiki/Flags.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ This page lists flags that can be used to configure BuildXL.
| LogToConsole | Displays the specified messages in the console. |
| LogToKusto | Whether to send log events to Kusto. If enabled, a valid authentication mechanism should be available with enough permissions to write into the blob storage account where logs are piped to Kusto. Use /logToKustoBlobUri:https://{storage-account-name}/{container-name} and /logToKustoIdentityId:{Identity guid} to specify the destination of the log messages. |
| LowPriority | Runs the build engine and all tools at a lower priority in order to provide better responsiveness to interactive processes on the current machine. |
| MachineHostName | Specifies the host name where the machine running the build can be reached. This value should only be overriden by build runners, never by a user. In particular, we need it to be overriddable because on ADO networks the machines are not reachable in the hostname that GetHostName returns, and we need a special suffix that is appended by the AdoBuildRunner. |
| ManageMemoryMode | Specifies the mode to manage memory under pressure. Defaults to CancellationRam where {ShortProductName} attemps to cancel processes. EmptyWorkingSet mode will empty working set of processes instead of cancellation. Suspend mode will suspend processes to free memory. |
| MaskUntrackedAccesses | When enabled, {ShortProductName} does not consider any access under untracked paths or scopes for sake of cache lookup. Defaults to on. |
| MaxCacheLookup | Specifies the maximum number of cache lookups that {ShortProductName} will launch at one time. The default value is three times the number of processors in the current machine. |
Expand Down
3 changes: 3 additions & 0 deletions Public/Src/App/Bxl/Args.cs
Original file line number Diff line number Diff line change
Expand Up @@ -790,6 +790,9 @@ public bool TryParse(string[] args, PathTable pathTable, out ICommandLineConfigu
OptionHandlerFactory.CreateBoolOption(
"lowPriority",
sign => schedulingConfiguration.LowPriority = sign),
OptionHandlerFactory.CreateOption(
"machineHostName",
opt => distributionConfiguration.MachineHostName = opt.Value),
OptionHandlerFactory.CreateOption(
"manageMemoryMode",
opt => schedulingConfiguration.ManageMemoryMode = CommandLineUtilities.ParseEnumOption<ManageMemoryMode>(opt)),
Expand Down
5 changes: 5 additions & 0 deletions Public/Src/App/Bxl/HelpText.cs
Original file line number Diff line number Diff line change
Expand Up @@ -956,6 +956,11 @@ public static void DisplayHelp(HelpLevel helpLevel)
Strings.HelpText_DisplayHelp_DistributedBuildOrchestratorLocation,
HelpLevel.Verbose);

hw.WriteOption(
"/machineHostName:<host name>",
Strings.HelpText_DisplayHelp_MachineHostName,
HelpLevel.Verbose);

hw.WriteOption(
"/enableWorkerSourceFileMaterialization[+|-]",
Strings.HelpText_DisplayHelp_DistributedBuildWorkerSourceMaterialization,
Expand Down
7 changes: 5 additions & 2 deletions Public/Src/App/Bxl/Strings.resx
Original file line number Diff line number Diff line change
Expand Up @@ -504,10 +504,13 @@
<data name="HelpText_DisplayHelp_DistributedBuildBanner" xml:space="preserve">
<value>DISTRIBUTED BUILD</value>
</data>
<data name="HelpText_DisplayHelp_DistributedBuildOrchestratorLocation" xml:space="preserve">
<data name="HelpText_DisplayHelp_DistributedBuildOrchestratorLocation" xml:space="preserve">
<value>Specifies the IP address or host name and TCP port of the orchestrator machine to which a worker will connect to join a build session. This argument is redundant if the orchestratro is invoked with /distributedBuildWorker specified for this worker. (short form: /dbo)</value>
</data>
<data name="HelpText_DisplayHelp_DistributedBuildWorker" xml:space="preserve">
<data name="HelpText_DisplayHelp_MachineHostName" xml:space="preserve">
<value>Specifies the host name where the machine running the build can be reached. This value should only be overriden by build runners, never by a user. In particular, we need it to be overriddable because on ADO networks the machines are not reachable in the hostname that GetHostName returns, and we need a special suffix that is appended by the AdoBuildRunner.</value>
</data>
<data name="HelpText_DisplayHelp_DistributedBuildWorker" xml:space="preserve">
<value>Specifies the IP address or host name and TCP port of remote worker build services which this process can dispatch work to during a distributed build (can specify multiple). This argument is redundant if the corresponding worker is invoked with /distributedBuildOrchestratorLocation specified. (short form: /dbw)</value>
</data>
<data name="HelpText_DisplayHelp_DistributedBuildServicePort" xml:space="preserve">
Expand Down
5 changes: 0 additions & 5 deletions Public/Src/Engine/Dll/Distribution/DistributionHelpers.cs
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,6 @@ public static ArraySegment<byte> ToArraySegmentByte(this ByteString byteString)
return new ArraySegment<byte>(byteString.ToByteArray());
}

internal static string GetServiceName(int port)
{
return GetServiceName(System.Net.Dns.GetHostName(), port);
}

internal static string GetServiceName(string ipAddress, int port)
{
return string.Format(CultureInfo.InvariantCulture, "{0}::{1}", ipAddress, port);
Expand Down
6 changes: 5 additions & 1 deletion Public/Src/Engine/Dll/Distribution/OrchestratorService.cs
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ internal ExecutionResultSerializer ResultSerializer
}
}

internal readonly string Hostname;

private readonly RemoteWorker[] m_remoteWorkers;
private readonly LoggingContext m_loggingContext;

Expand All @@ -76,6 +78,8 @@ public OrchestratorService(IDistributionConfiguration config, LoggingContext log
{
Contract.Requires(config != null && config.BuildRole.IsOrchestrator());

Hostname = config.MachineHostName;

// Create all remote workers
m_buildServicePort = config.BuildServicePort;
m_remoteWorkers = new RemoteWorker[config.RemoteWorkerCount];
Expand Down Expand Up @@ -381,7 +385,7 @@ public bool Hello(ServiceLocation workerLocation)
{
lock (m_remoteWorkers)
{
if (m_remoteWorkers.Any(rw => rw.Location.IpAddress == workerLocation.IpAddress && rw.Location.Port == workerLocation.Port))
if (m_remoteWorkers.Any(rw => rw.Location?.IpAddress == workerLocation.IpAddress && rw.Location?.Port == workerLocation.Port))
{
// We already know this worker (presumably, from the command line).
// Just acknowledge the RPC.
Expand Down
2 changes: 1 addition & 1 deletion Public/Src/Engine/Dll/Distribution/RemoteWorker.cs
Original file line number Diff line number Diff line change
Expand Up @@ -442,7 +442,7 @@ private async Task<bool> TryAttachAsync()
FingerprintSalt = m_orchestratorService.Environment.ContentFingerprinter.FingerprintSalt,
OrchestratorLocation = new ServiceLocation
{
IpAddress = Dns.GetHostName(),
IpAddress = m_orchestratorService.Hostname,
Port = m_orchestratorService.Port,
},
};
Expand Down
2 changes: 1 addition & 1 deletion Public/Src/Engine/Dll/Distribution/WorkerService.cs
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,7 @@ async Task IWorkerService.SayHelloAsync(IDistributionServiceLocation orchestrato
m_orchestratorInitialized = true;
m_orchestratorClient.Initialize(orchestratorLocation.IpAddress, orchestratorLocation.BuildServicePort, OnConnectionFailureAsync);

var helloResult = await m_orchestratorClient.SayHelloAsync(new ServiceLocation() { IpAddress = Dns.GetHostName(), Port = m_port });
var helloResult = await m_orchestratorClient.SayHelloAsync(new ServiceLocation() { IpAddress = m_config.Distribution.MachineHostName, Port = m_port });
if (!helloResult.Succeeded)
{
// If we can't say hello there is no hope for attachment
Expand Down
3 changes: 3 additions & 0 deletions Public/Src/Tools/AdoBuildRunner/Build/BuildContext.cs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ public record BuildContext
/// <nodoc />
public required string AgentMachineName { get; init; }

/// <nodoc />
public required string AgentHostName { get; init; }

/// <nodoc />
public required string SourcesDirectory { get; init; }

Expand Down
13 changes: 8 additions & 5 deletions Public/Src/Tools/AdoBuildRunner/Build/BuildExecutor.cs
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ public BuildExecutor(ILogger logger) : base(logger)
m_bxlExeLocation = Path.Combine(Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location)!, exeName);
}

private int ExecuteBuild(IEnumerable<string> arguments, string buildSourcesDirectory)
private int ExecuteBuild(BuildContext buildContext, IEnumerable<string> arguments, string buildSourcesDirectory)
{
var fullArguments = SetDefaultArguments(arguments);
var fullArguments = SetDefaultArguments(buildContext, arguments);

var process = new Process()
{
Expand Down Expand Up @@ -94,14 +94,15 @@ public void PrepareBuildEnvironment(BuildContext buildContext)
public int ExecuteSingleMachineBuild(BuildContext buildContext, string[] buildArguments)
{
Logger.Info($@"Launching single machine build");
return ExecuteBuild(buildArguments, buildContext.SourcesDirectory);
return ExecuteBuild(buildContext,buildArguments, buildContext.SourcesDirectory);
}

/// <inherit />
public int ExecuteDistributedBuildAsOrchestrator(BuildContext buildContext, string relatedSessionId, string[] buildArguments)
{
Logger.Info($@"Launching distributed build as orchestrator");
return ExecuteBuild(
buildContext,
buildArguments.Concat(new[]
{
$"/distributedBuildRole:orchestrator",
Expand All @@ -118,6 +119,7 @@ public int ExecuteDistributedBuildAsWorker(BuildContext buildContext, BuildInfo
Logger.Info($@"Launching distributed build as worker");

return ExecuteBuild(
buildContext,
// By default, set the timeout to 20min in the workers to avoid unnecessary waiting upon connection failures
// (defaults are placed in front of user-provided arguments).
new[]
Expand All @@ -142,7 +144,7 @@ public void InitializeAsWorker(BuildContext buildContext, string[] buildArgument
// No prep work to do
}

private static IEnumerable<string> SetDefaultArguments(IEnumerable<string> arguments)
private static IEnumerable<string> SetDefaultArguments(BuildContext buildContext, IEnumerable<string> arguments)
{
// The default values are added to the start of command line string.
// This way, user-provided arguments will be able to override the defaults.
Expand All @@ -160,7 +162,8 @@ private static IEnumerable<string> SetDefaultArguments(IEnumerable<string> argum
"/earlyWorkerRelease-",
// This flag could make sense to enable as default across the board (not only for ADO), but for now
// let's keep dev builds out of it until we can validate it doesn't introduce a regression.
"/useHistoricalCpuUsageInfo+"
"/useHistoricalCpuUsageInfo+",
$"/machineHostName:{buildContext.AgentHostName}"
};

// Enable gRPC encryption
Expand Down
2 changes: 1 addition & 1 deletion Public/Src/Tools/AdoBuildRunner/Build/BuildManager.cs
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ public async Task<int> BuildAsync(bool isOrchestrator)
if (isOrchestrator)
{
// The orchestrator creates the build info and publishes it to the build properties
var buildInfo = new BuildInfo { RelatedSessionId = Guid.NewGuid().ToString("D"), OrchestratorLocation = m_buildContext.AgentMachineName };
var buildInfo = new BuildInfo { RelatedSessionId = Guid.NewGuid().ToString("D"), OrchestratorLocation = m_buildContext.AgentHostName };
await m_vstsApi.PublishBuildInfo(m_buildContext, buildInfo);
returnCode = m_executor.ExecuteDistributedBuildAsOrchestrator(m_buildContext, buildInfo.RelatedSessionId, m_buildArguments);
}
Expand Down
Loading

0 comments on commit db078cf

Please sign in to comment.