Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Methods with both stackalloc and loops bypass tiering #85548

Open
AndyAyersMS opened this issue Apr 28, 2023 · 5 comments
Open

Methods with both stackalloc and loops bypass tiering #85548

AndyAyersMS opened this issue Apr 28, 2023 · 5 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

For example:

private bool TryGetNamedPropertyValue(
int startIndex,
int endIndex,
ReadOnlySpan<byte> propertyName,
out JsonElement value)
{
ReadOnlySpan<byte> documentSpan = _utf8Json.Span;
Span<byte> utf8UnescapedStack = stackalloc byte[JsonConstants.StackallocByteThreshold];
// Move to the row before the EndObject
int index = endIndex - DbRow.Size;
while (index > startIndex)
{
DbRow row = _parsedData.Get(index);

Our current method of tiering for methods with loops relies on OSR, but OSR currently cannot handle stackalloc. Instead, such methods are initially jitted with full optimization. Thus they do not benefit from tiering or Dynamic PGO.

Note this is not a new issue; things have been this way since .NET 7 where we enabled OSR, but it may become more pressing now that we are considering enabling PGO by default.

I don't know how many cases of this there are yet, or what if any performance we might gain by trying to address this. But I ran across one example here: #84264 (comment).

Adding stackalloc support to OSR is currently considered to be difficult. A method with stackalloc has two independently addressable areas of its stack frame (commonly handled by using both stack pointer and frame pointer to address locals). But an OSR method must potentially support three areas, and we currently do not have support in the jit or the diagnostics stack for a third stack frame base register.

If this combination turns out to be relatively uncommon, we might consider refactoring the code to avoid this pattern.

cc @dotnet/jit-contrib @stephentoub

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Apr 28, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 28, 2023
@EgorBo
Copy link
Member

EgorBo commented Apr 29, 2023

@AndyAyersMS currently we sort of promote stackalloc to locals for size <= 32 bytes - do we still bypass tier0 for small stackallocs? and does it makes sense to extend that limit for OSR?

@AndyAyersMS
Copy link
Member Author

We decide we need to switch to full opt before importing, so before we know for sure how big the stackalloc(s) might be.

It looks like we try to do this conversion in tier0 too (in fact possibly even in minops, though both may end up getting blocked in practice as gtFoldExpr likely fails to fold).

I don't know if upping the limit makes sense or not, but if we do we should do it across the board and not just for these cases.

@jeffschwMSFT jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 29, 2023
@ghost
Copy link

ghost commented Apr 29, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

For example:

private bool TryGetNamedPropertyValue(
int startIndex,
int endIndex,
ReadOnlySpan<byte> propertyName,
out JsonElement value)
{
ReadOnlySpan<byte> documentSpan = _utf8Json.Span;
Span<byte> utf8UnescapedStack = stackalloc byte[JsonConstants.StackallocByteThreshold];
// Move to the row before the EndObject
int index = endIndex - DbRow.Size;
while (index > startIndex)
{
DbRow row = _parsedData.Get(index);

Our current method of tiering for methods with loops relies on OSR, but OSR currently cannot handle stackalloc. Instead, such methods are initially jitted with full optimization. Thus they do not benefit from tiering or Dynamic PGO.

Note this is not a new issue; things have been this way since .NET 7 where we enabled OSR, but it may become more pressing now that we are considering enabling PGO by default.

I don't know how many cases of this there are yet, or what if any performance we might gain by trying to address this. But I ran across one example here: #84264 (comment).

Adding stackalloc support to OSR is currently considered to be difficult. A method with stackalloc has two independently addressable areas of its stack frame (commonly handled by using both stack pointer and frame pointer to address locals). But an OSR method must potentially support three areas, and we currently do not have support in the jit or the diagnostics stack for a third stack frame base register.

If this combination turns out to be relatively uncommon, we might consider refactoring the code to avoid this pattern.

cc @dotnet/jit-contrib @stephentoub

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged, needs-area-label

Milestone: -

@teo-tsirpanis teo-tsirpanis removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 30, 2023
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label May 1, 2023
@JulieLeeMSFT JulieLeeMSFT added this to the Future milestone May 1, 2023
@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented May 3, 2023

Recall OSR is an insurance policy to make sure we can escape from Tier0 code no matter what the method does—in particular it handles methods that have very long running loops.

Since TryGetNamedPropertyValue and other similar BCL methods are unlikely to run for 1000's of loop iterations and get stuck in Tier0 they don't actually "require" OSR to escape. So we can get an idea of the benefit we're forgoing by just disabling QJFL OSR and handling these as normally tiered methods.

@AndyAyersMS
Copy link
Member Author

The linked test (System.Text.Json.Document.Tests.Perf_DocumentParse.Parse(IsDataIndented: False, TestRandomAccess: True, TestCase: Json400KB)) is pretty noisy for me locally so hard to be confident ... disabling OSR may give a small perf boost but I can' be sure there is any real impact.

A different screen is to find out how many methods have this combination and perhaps from there find different impacted tests; let me try that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

5 participants