[.NET 8] <PublishTrimmed> no longers works but throws build process for an infinite loop #94831

DierkDroth · 2023-11-16T05:42:15Z

Description

So far I built my ARM64 binary by
dotnet publish -c Release -r linux-arm64 -p:PublishSingleFile=true -p:SelfContained=true -p:PublishReadyToRun=true MyProject.csproj

Here are my project settings:

	<PropertyGroup>
		<OutputType>Exe</OutputType>
		<TargetFramework>net8.0</TargetFramework>
		<VersionPrefix>1.10.14</VersionPrefix>
		<RunAnalyzersDuringLiveAnalysis>False</RunAnalyzersDuringLiveAnalysis>
		<RunAnalyzersDuringBuild>False</RunAnalyzersDuringBuild>
		<EnableNETAnalyzers>False</EnableNETAnalyzers>
		<ProduceReferenceAssembly>False</ProduceReferenceAssembly>
		<PublishTrimmed>true</PublishTrimmed>
	</PropertyGroup>

Turns out <PublishTrimmed>true</PublishTrimmed> no longers works: the build which tools ~3 minutes on .NET 7 now did not terminate the 'optimization phase' after > 9 hours.

What's going wrong and how could I trim my build as before?

Reproduction Steps

Please see above

Expected behavior

Trimming should work with .NET8 as it worked with .NET 7

Actual behavior

Trimming does not work with .NET8 as it worked with .NET 7

Regression?

Yes

Known Workarounds

None

Configuration

No response

Other information

No response

The text was updated successfully, but these errors were encountered:

ghost · 2023-11-16T05:48:01Z

Tagging subscribers to this area: @agocke, @sbomer, @vitek-karas
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

So far I built my ARM64 binary by
dotnet publish -c Release -r linux-arm64 -p:PublishSingleFile=true -p:SelfContained=true -p:PublishReadyToRun=true MyProject.csproj

Here are my project settings:

	<PropertyGroup>
		<OutputType>Exe</OutputType>
		<TargetFramework>net8.0</TargetFramework>
		<VersionPrefix>1.10.14</VersionPrefix>
		<RunAnalyzersDuringLiveAnalysis>False</RunAnalyzersDuringLiveAnalysis>
		<RunAnalyzersDuringBuild>False</RunAnalyzersDuringBuild>
		<EnableNETAnalyzers>False</EnableNETAnalyzers>
		<ProduceReferenceAssembly>False</ProduceReferenceAssembly>
		<PublishTrimmed>true</PublishTrimmed>
	</PropertyGroup>

Turns out <PublishTrimmed>true</PublishTrimmed> no longers works: the build which tools ~3 minutes on .NET 7 now did not terminate the 'optimization phase' after > 9 hours.

What's going wrong and how could I trim my build as before?

Reproduction Steps

Please see above

Expected behavior

Trimming should work with .NET8 as it worked with .NET 7

Actual behavior

Trimming does not work with .NET8 as it worked with .NET 7

Regression?

Yes

Known Workarounds

None

Configuration

No response

Other information

No response

Author:	DierkDroth
Assignees:	-
Labels:	`untriaged`, `area-Tools-ILLink`, `needs-area-label`
Milestone:	-

vitek-karas · 2023-11-16T11:06:35Z

Thanks for reporting this. Unfortunately, there's not enough info here to figure out what's going on.

Could you please try some of these:

Running the "hanging" publish - can you look what process is hanging. I would expect it to be eating CPU cycles instead of just waiting on something. If it's related to trimming it would be a process dotnet.exe and with illink.dll on the command line - but I'm guessing.
If you find that process, can you try attaching a debugger to it (VS attach debugger would work well for this)? Then break the debugger and look for threads which have something interesting on the callstack - if it's trimming related the code for that lives in Mono.Linker namespace - the callstacks should be safe to share, the trimmer runs only code from the trimmer and the framework, there won't be anything from your project visible on the stack.
I assume you can't share your project, but maybe you could share the list of NuGet dependencies (it's common that the problem is caused by some code from the NuGet dependencies and those are public, so should not be a problem to share)
Anything special about your project you can think of? (I know it's a vague question, but you never know)

DierkDroth · 2023-11-16T13:46:56Z

Thanks for getting back to me @vitek-karas.

Curious: would it help if I attached the untrimmed binary so you could try running the trim logic on it?

vitek-karas · 2023-11-16T16:50:51Z

We would need all of the input binaries and the trimmer command line. You could try using https://github.com/vitek-karas/illinkrepro. Basically you run your publish with /bl to produce msbuild.binlog. The tool in that repo then parses that and extracts the trimmer command line from it. Then it copies all of the input assemblies into a repro directory and constructs a copy of the command line. If you're comfortable sharing that, it should let me reproduce the failure/hang exactly. The repro will contain all of the input assemblies (not source code, compiled IL) and the command line to trimmer (which may sometimes contains certain local paths).

DierkDroth · 2023-11-16T16:57:50Z

Thanks @vitek-karas.

To clarify:

I run the build without the trimming option but with the /bl option
this creates a msbuild.binlog file

Could I just attach that msbuild.binlog here and you then would have what you needed to further analyze the case?

Note: I already had quite a few support cases/issues with the great Platform UNO team, where I happily shared BINLOG files. That's why I wouldn't have any concern on sharing that file.

Please let me know if you felt that would be a way to go or if I rather should go through the steps you outlined above. Thanks

vitek-karas · 2023-11-16T17:39:15Z

Small correction:

You run the publish command which hangs (no changes) just add /bl
This creates the "interesting" binlog

The binlog is unfortunately not enough - it contains the command line to the trimmer, but not the input assemblies. For that something like the tool from the illinkrepro repo, to collect the input assemblies as well, is needed.

We can also arrange for a more private way of sending this data, if you want. The best way then is to create a Visual Studio Feedback item and attach the data there.

DierkDroth · 2023-11-16T19:43:36Z

Thanks @vitek-karas. I just created a repo and filed it as Visual Studio Feedback item.

Side note: It would make the handling of your illinkrepo tool more handy if just a single, self-contained file would be created. I had to copy the illinkrepo.* files + all dlls to the target directory.

Please let me know if you needed anything else.

jeffschwMSFT · 2023-11-17T15:51:57Z

Cross linking - https://developercommunity.visualstudio.com/t/In-reference-to-case-https:githubcom/10518176

vitek-karas · 2023-11-20T16:24:01Z

@DierkDroth thanks a lot for the repro - I was able to investigate.

@sbomer for ideas about fixing:

The problem is a combination of the interprocedural data flow and a limit in ValueSet to avoid exponential state expansion. The way we implement the limit is to reset the size of the ValueSet when it reaches a limit (256 right now). This works well except when the value is part of hoisted local set. In that case the reset modifies the interprocedural state and causes the analysis to never reach a stable state. It basically goes from 0 to the limit size, some of the values in between are captured in the state. Then it resets back to 0, so the next iteration will see a different value, and it will start to grow again.

The repro for this is a very large state machine. The state field will be assigned constant values (in this app's case it's 533 different values), and that will grow the set, eventually it reaches 256 values and resets back to 0, but that we notice that one of the numbers we've added is missing, so we add it again -> grows to 256 again, and so on and on.

A "Simple" repro looks like:

			[ExpectedWarning ("IL2072", CompilerGeneratedCode = true)]
			static async void VeryLargeStateMachine()
			{
				Type t = GetWithPublicFields ();

				// 100
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();

				// 200
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();

				// 300
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();
				await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync (); await MethodAsync ();

				t.RequiresAll ();
			}

vitek-karas · 2023-11-20T16:27:34Z

As for the potential fix... the only idea I was able to come up with is to:

We can't reset the ValueSet to empty, because it removes already added state, and we rely on the value sets to be "Add only", otherwise we can't guarantee reaching a stable state.
If we hit the limit of values, set the value to "overflow" special value and then in Meet - "overflow" meet "anything" => "overflow". That way the value will eventually stabilize on "overflow".
We would then treat the "overflow" as an empty set for the purposes of enumeration of values - this is to avoid producing warnings (empty set causes the analysis to "skip" actions on it).

DierkDroth · 2023-11-20T16:58:23Z

Thanks for your feedback @vitek-karas. I'm happy to help and delighted that you had been able to isolate the cause of trouble.

Curious: you mentioned 'exponential state expansion'. Would you consider my app as being rather 'big', that this scenario was not uncovered before?!?

sbomer · 2023-11-20T17:21:30Z

@vitek-karas thanks for investigating. I agree with your assessment - it didn't occur to me at the time that resetting the values in the meet operation breaks the invariant that meet should always move down the lattice. I agree we should add a special "overflow" (or "bottom") value with the meet behavior you described. Maybe it would be better to treat it as an unknown value (treat it as a pattern we couldn't analyze and produce a warning) instead of skipping it.

vitek-karas · 2023-11-20T18:03:16Z

@DierkDroth

Your app is not that big, although there is one method in it which is very large and has LOT of async calls in it. That's what caused the problem. But overall it's not that crazy ("only ~500 await calls"). The exponential state problem is that the tooling simulates all potential values a variable can take in the method, and it's very easy to blow that set out of the water. For example:

				typeof (TestType).RequiresAll (); // Force data flow analysis

				object[] data = new object[20];
				if (true) data[0] = new object ();
				if (true) data[1] = new object ();
				if (true) data[2] = new object ();
				if (true) data[3] = new object ();
				if (true) data[4] = new object ();
				if (true) data[5] = new object ();
				if (true) data[6] = new object ();
				if (true) data[7] = new object ();
				if (true) data[8] = new object ();
				if (true) data[9] = new object ();
				if (true) data[10] = new object ();
				if (true) data[11] = new object ();
				if (true) data[12] = new object ();
				if (true) data[13] = new object ();
				if (true) data[14] = new object ();
				if (true) data[15] = new object ();
				if (true) data[16] = new object ();
				if (true) data[17] = new object ();
				if (true) data[18] = new object ();
				if (true) data[19] = new object ();

Now just assume that tool is not smart enough to figure out the condition, that it has to assume in each if both branches can be taken. Then the number of possible states for this test is on the order of 2^20. The tool obviously can't represent it in memory, so we have to "give up" at some point.

vitek-karas · 2023-11-20T18:05:37Z

@sbomer
I guess you're right. The trimmer should play it safe, and thus if it can't figure something out, it should warn. In this case it might be annoying, but it's probably better than saying "everything's fine" (insert your favorite fire meme ;-))

@vitek-karas

#87634 introduced a limit on the number of values tracked in dataflow analysis, but this approach was incorrect because resetting the set of tracked values was effectively moving up the dataflow lattice, breaking the invariant and causing potential hangs. The hang was observed in #94831, where (as found by @vitek-karas) the hoisted local `state` field of an async method with many await points was getting a large number of tracked values, hitting the limit, being reset to "empty", but then growing again, causing the analysis not to converge. The fix is to instead introduce a special value `ValueSet<TValue>.Unknown` that is used for this case. It acts like "bottom" in the lattice, so that it destroys any other inputs on meet ("bottom" meet anything else is "bottom"). In this testcase the `state` field isn't involved in dataflow warnings, so this actually allows the method to be analyzed correctly, producing the expected warning for the `Type` local that flows across all of the await points. Fixes #94831

@vitek-karas

…et#95041) dotnet#87634 introduced a limit on the number of values tracked in dataflow analysis, but this approach was incorrect because resetting the set of tracked values was effectively moving up the dataflow lattice, breaking the invariant and causing potential hangs. The hang was observed in dotnet#94831, where (as found by @vitek-karas) the hoisted local `state` field of an async method with many await points was getting a large number of tracked values, hitting the limit, being reset to "empty", but then growing again, causing the analysis not to converge. The fix is to instead introduce a special value `ValueSet<TValue>.Unknown` that is used for this case. It acts like "bottom" in the lattice, so that it destroys any other inputs on meet ("bottom" meet anything else is "bottom"). In this testcase the `state` field isn't involved in dataflow warnings, so this actually allows the method to be analyzed correctly, producing the expected warning for the `Type` local that flows across all of the await points. Fixes dotnet#94831

DierkDroth · 2023-11-28T05:43:42Z

@sbomer thanks for working on that issue.

Curious: would this result in a .NET8 update? If so, what would be the expected timeframe?

sbomer · 2023-11-28T18:20:59Z

Thank you for reporting it! I'm backporting the fix into .NET 8 at #95302 - if approved, it should be part of the next .NET 8 servicing release, so probably one of the next Patch Tuesdays.

DierkDroth · 2023-11-28T19:08:58Z

Excellent! Thanks for your feedback @sbomer

@vitek-karas

…) (#95302) #87634 introduced a limit on the number of values tracked in dataflow analysis, but this approach was incorrect because resetting the set of tracked values was effectively moving up the dataflow lattice, breaking the invariant and causing potential hangs. The hang was observed in #94831, where (as found by @vitek-karas) the hoisted local `state` field of an async method with many await points was getting a large number of tracked values, hitting the limit, being reset to "empty", but then growing again, causing the analysis not to converge. The fix is to instead introduce a special value `ValueSet<TValue>.Unknown` that is used for this case. It acts like "bottom" in the lattice, so that it destroys any other inputs on meet ("bottom" meet anything else is "bottom"). In this testcase the `state` field isn't involved in dataflow warnings, so this actually allows the method to be analyzed correctly, producing the expected warning for the `Type` local that flows across all of the await points. Fixes #94831

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 16, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Nov 16, 2023

DierkDroth changed the title ~~[.NET 8] <PublishTrimmed> no longers work but throws build process for an infinite loop~~ [.NET 8] <PublishTrimmed> no longers works but throws build process for an infinite loop Nov 16, 2023

lewing added the area-Tools-ILLink .NET linker development as well as trimming analyzers label Nov 16, 2023

lewing removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 16, 2023

marek-safar removed the untriaged New issue has not been triaged by the area owner label Nov 16, 2023

marek-safar added this to the 8.0.x milestone Nov 16, 2023

marek-safar added the regression-from-last-release label Nov 16, 2023

sbomer self-assigned this Nov 20, 2023

sbomer mentioned this issue Nov 20, 2023

Fix ILLink/ILC hang when tracking too many hoisted local values #95041

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 20, 2023

sbomer closed this as completed in #95041 Nov 27, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Nov 27, 2023

github-actions bot locked and limited conversation to collaborators Dec 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[.NET 8] <PublishTrimmed> no longers works but throws build process for an infinite loop #94831

[.NET 8] <PublishTrimmed> no longers works but throws build process for an infinite loop #94831

DierkDroth commented Nov 16, 2023 •

edited

Loading

ghost commented Nov 16, 2023

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

vitek-karas commented Nov 16, 2023

DierkDroth commented Nov 16, 2023

vitek-karas commented Nov 16, 2023

DierkDroth commented Nov 16, 2023 •

edited

Loading

vitek-karas commented Nov 16, 2023

DierkDroth commented Nov 16, 2023

jeffschwMSFT commented Nov 17, 2023

vitek-karas commented Nov 20, 2023

vitek-karas commented Nov 20, 2023

DierkDroth commented Nov 20, 2023 •

edited

Loading

sbomer commented Nov 20, 2023

vitek-karas commented Nov 20, 2023

vitek-karas commented Nov 20, 2023

DierkDroth commented Nov 28, 2023

sbomer commented Nov 28, 2023

DierkDroth commented Nov 28, 2023

[.NET 8] <PublishTrimmed> no longers works but throws build process for an infinite loop #94831

[.NET 8] <PublishTrimmed> no longers works but throws build process for an infinite loop #94831

Comments

DierkDroth commented Nov 16, 2023 • edited Loading

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

ghost commented Nov 16, 2023

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

vitek-karas commented Nov 16, 2023

DierkDroth commented Nov 16, 2023

vitek-karas commented Nov 16, 2023

DierkDroth commented Nov 16, 2023 • edited Loading

vitek-karas commented Nov 16, 2023

DierkDroth commented Nov 16, 2023

jeffschwMSFT commented Nov 17, 2023

vitek-karas commented Nov 20, 2023

vitek-karas commented Nov 20, 2023

DierkDroth commented Nov 20, 2023 • edited Loading

sbomer commented Nov 20, 2023

vitek-karas commented Nov 20, 2023

vitek-karas commented Nov 20, 2023

DierkDroth commented Nov 28, 2023

sbomer commented Nov 28, 2023

DierkDroth commented Nov 28, 2023

DierkDroth commented Nov 16, 2023 •

edited

Loading

DierkDroth commented Nov 16, 2023 •

edited

Loading

DierkDroth commented Nov 20, 2023 •

edited

Loading