Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET 8.0.10 vs 9.0.0 RC2 GC Server Performance Regression in Sep (CSV Parser) Benchmark (due to DATAS default) #109047

Open
nietras opened this issue Oct 19, 2024 · 6 comments
Labels
area-GC-coreclr tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner

Comments

@nietras
Copy link
Contributor

nietras commented Oct 19, 2024

In https://github.com/nietras/Sep (a fast highly optimized CSV parser) I have been comparing performance comparison-bench.ps1 between .NET 8 and .NET 9 RC2 and have observed what appears to be consistent and significant performance regression when using ServerGarbageCollection (true). The benchmark in question is also discussed in https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers

Benchmarks can be run by cloning the Sep repo, checking out branch net9.0 and running the command in the comparison-bench.ps1 perhaps adding --filter *GcServer*Sep* or similar. Details for benchmark, machine are given below via BenchmarkDotNet.

As can be seen this shows regression in a scenario of many medium size object allocations ranging from 500ms/429ms = 1.17x (single thread) to 174ms/102ms = 1.69x (multi-threaded) regression.

I know there have been changes to the GC my question is whether this regression is expected? And just wanted to flag it if it has any interest.

BenchmarkDotNet v0.14.0, Windows 10 (10.0.19044.3086/21H2/November2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]     : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
  Job-YVJTZC : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
  Job-ZDJCYM : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX2

Server=True  InvocationCount=Default  IterationTime=350ms  
MaxIterationCount=15  MinIterationCount=5  WarmupCount=6  
Quotes=False  Reader=String  

Method Runtime Scope Rows Mean Ratio MB MB/s ns/row Allocated Alloc Ratio
Sep______ .NET 8.0 Asset 50000 21.402 ms 1.00 29 1363.5 428.0 14133102 B 1.00
Sep_MT___ .NET 8.0 Asset 50000 5.576 ms 0.26 29 5233.7 111.5 14308501 B 1.01
Sep______ .NET 9.0 Asset 50000 24.444 ms 1.14 29 1193.8 488.9 14133077 B 1.00
Sep_MT___ .NET 9.0 Asset 50000 8.965 ms 0.42 29 3255.0 179.3 14310332 B 1.01
Sep______ .NET 8.0 Asset 1000000 429.654 ms 1.00 583 1358.7 429.7 273063216 B 1.00
Sep_MT___ .NET 8.0 Asset 1000000 102.979 ms 0.24 583 5668.9 103.0 274049328 B 1.00
Sep______ .NET 9.0 Asset 1000000 500.250 ms 1.16 583 1167.0 500.3 273062592 B 1.00
Sep_MT___ .NET 9.0 Asset 1000000 174.802 ms 0.41 583 3339.7 174.8 273973628 B 1.00
@nietras nietras added the tenet-performance Performance related issue label Oct 19, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 19, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Oct 19, 2024
@EgorBo
Copy link
Member

EgorBo commented Oct 19, 2024

Try with DATAS disabled e.g. <GarbageCollectionAdaptationMode>0</GarbageCollectionAdaptationMode>

@stephentoub
Copy link
Member

cc: @mangod9, @Maoni0

@nietras
Copy link
Contributor Author

nietras commented Oct 19, 2024

Command I run from branch net9.0

dotnet run -c Release -f net8.0 --project src/Sep.ComparisonBenchmarks/Sep.ComparisonBenchmarks.csproj -- -m --warmupCount 6 --minIterationCount 5 --maxIterationCount 15 --runtimes net80 net90 --iterationTime 350 --hide Type Quotes Reader RatioSD Gen0 Gen1 Gen2 Error Median StdDev --filter *GcServerLongAsset*Sep*

No change with <GarbageCollectionAdaptationMode>0</GarbageCollectionAdaptationMode> but can't remember if BDN actually forward this to sub-processes? Is there a flag to tell BDN to use this like Server=True?

Server=True  InvocationCount=Default  IterationTime=350ms
MaxIterationCount=15  MinIterationCount=5  WarmupCount=6
Quotes=False  Reader=String

| Method    | Runtime  | Scope | Rows    | Mean     | Ratio | MB  | MB/s   | ns/row | Allocated | Alloc Ratio |
|---------- |--------- |------ |-------- |---------:|------:|----:|-------:|-------:|----------:|------------:|
| Sep______ | .NET 8.0 | Asset | 1000000 | 431.7 ms |  1.00 | 583 | 1352.1 |  431.7 | 260.41 MB |        1.00 |
| Sep_MT___ | .NET 8.0 | Asset | 1000000 | 111.1 ms |  0.26 | 583 | 5252.6 |  111.1 |  261.2 MB |        1.00 |
| Sep______ | .NET 9.0 | Asset | 1000000 | 500.7 ms |  1.16 | 583 | 1165.9 |  500.7 | 260.42 MB |        1.00 |
| Sep_MT___ | .NET 9.0 | Asset | 1000000 | 178.4 ms |  0.41 | 583 | 3272.0 |  178.4 | 261.32 MB |        1.00 |

@nietras
Copy link
Contributor Author

nietras commented Oct 19, 2024

Yes, it's DATAS. Tried settings it with environment variable e.g. for BDN with --envVars DOTNET_GCDynamicAdaptationMode:0 and tried running with 0 and 1 as can be seen below. This means "regression" is solely due to DATAS being default and otherwise no difference

NO DATAS

dotnet run -c Release -f net8.0 --project src/Sep.ComparisonBenchmarks/Sep.ComparisonBenchmarks.csproj -- -m --warmupCount 6 --minIterationCount 5 --maxIterationCount 15 --runtimes net80 net90 --iterationTime 350 --hide Type Quotes Reader RatioSD Gen0 Gen1 Gen2 Error Median StdDev --filter *GcServerLongAsset*Sep* --envVars DOTNET_GCDynamicAdaptationMode:0
BenchmarkDotNet v0.14.0, Windows 10 (10.0.19044.3086/21H2/November2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]     : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
  Job-KKDGWQ : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
  Job-HUTQEJ : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX2

EnvironmentVariables=DOTNET_GCDynamicAdaptationMode=0  Server=True  InvocationCount=Default
IterationTime=350ms  MaxIterationCount=15  MinIterationCount=5
WarmupCount=6  Quotes=False  Reader=String

| Method    | Runtime  | Scope | Rows    | Mean     | Ratio | MB  | MB/s   | ns/row | Allocated | Alloc Ratio |
|---------- |--------- |------ |-------- |---------:|------:|----:|-------:|-------:|----------:|------------:|
| Sep______ | .NET 8.0 | Asset | 1000000 | 452.7 ms |  1.00 | 583 | 1289.6 |  452.7 | 260.41 MB |        1.00 |
| Sep_MT___ | .NET 8.0 | Asset | 1000000 | 112.4 ms |  0.25 | 583 | 5195.4 |  112.4 | 261.51 MB |        1.00 |
| Sep______ | .NET 9.0 | Asset | 1000000 | 445.3 ms |  0.98 | 583 | 1310.9 |  445.3 | 260.41 MB |        1.00 |
| Sep_MT___ | .NET 9.0 | Asset | 1000000 | 117.8 ms |  0.26 | 583 | 4954.0 |  117.8 | 261.38 MB |        1.00 |

DATAS

dotnet run -c Release -f net8.0 --project src/Sep.ComparisonBenchmarks/Sep.ComparisonBenchmarks.csproj -- -m --warmupCount 6 --minIterationCount 5 --maxIterationCount 15 --runtimes net80 net90 --iterationTime 350 --hide Type Quotes Reader RatioSD Gen0 Gen1 Gen2 Error Median StdDev --filter *GcServerLongAsset*Sep* --envVars DOTNET_GCDynamicAdaptationMode:1
BenchmarkDotNet v0.14.0, Windows 10 (10.0.19044.3086/21H2/November2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]     : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
  Job-ZORNME : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
  Job-BHTHZN : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX2

EnvironmentVariables=DOTNET_GCDynamicAdaptationMode=1  Server=True  InvocationCount=Default
IterationTime=350ms  MaxIterationCount=15  MinIterationCount=5
WarmupCount=6  Quotes=False  Reader=String

| Method    | Runtime  | Scope | Rows    | Mean     | Ratio | MB  | MB/s   | ns/row | Allocated | Alloc Ratio |
|---------- |--------- |------ |-------- |---------:|------:|----:|-------:|-------:|----------:|------------:|
| Sep______ | .NET 8.0 | Asset | 1000000 | 527.5 ms |  1.00 | 583 | 1106.6 |  527.5 | 260.41 MB |        1.00 |
| Sep_MT___ | .NET 8.0 | Asset | 1000000 | 170.0 ms |  0.32 | 583 | 3433.5 |  170.0 | 261.41 MB |        1.00 |
| Sep______ | .NET 9.0 | Asset | 1000000 | 528.2 ms |  1.00 | 583 | 1105.2 |  528.2 | 260.41 MB |        1.00 |
| Sep_MT___ | .NET 9.0 | Asset | 1000000 | 182.9 ms |  0.35 | 583 | 3192.2 |  182.9 | 261.17 MB |        1.00 |

@nietras nietras changed the title .NET 8.0.10 vs 9.0.0 RC2 GC Server Performance Regression in Sep (CSV Parser) Benchmark .NET 8.0.10 vs 9.0.0 RC2 GC Server Performance Regression in Sep (CSV Parser) Benchmark (due to DATAS default) Oct 19, 2024
@vcsjones vcsjones added area-GC-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 19, 2024
@mangod9
Copy link
Member

mangod9 commented Oct 19, 2024

yeah a throughput regression for certain microbenchmark scenarios is expected with DATAS. Assume the benchmark shows improved working set utilization?

@hez2010
Copy link
Contributor

hez2010 commented Oct 19, 2024

It is expected in .NET 9.

In general, DATAS should benefit real-world applications a lot as it can largely reduce the working set and also improve GC latency, though it comes with a minor throughput penalty.

In another similar issue (#101006) I did a binary-tree allocation benchmark and got the following benchmark result on .NET 9 rc2:

image-6.png

Considering the large improvements to latency and working set, I would take the minor throughput perf regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-GC-coreclr tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

6 participants