Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value num refine phis #104752

Merged
merged 8 commits into from
Jul 17, 2024
Merged

Value num refine phis #104752

merged 8 commits into from
Jul 17, 2024

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Jul 11, 2024

Alternative take on #104458.

VN doesn't always give PHI defs the best possible values (in particular if there are backedge PHI args). Revise VN to run intg a "loop-respecting" RPO where we don't visit any loop successors until all loop blocks have been visited. Once the loop is done, update the header PHI VNs since all PHI arg VNs are now known.

Then look for equivalent PHI defs in copy prop and enable copy prop when two locals have the same values at the head of a loop.

Addresses the regression case noted in #95645 (comment) where cross-block morph's copy prop plus loop bottom testing has created some unnecessary loop-carried values.

Closes #95645.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 11, 2024
@AndyAyersMS
Copy link
Member Author

@jakobbotsch FYI

Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

{
if (vn1 == vn2)
{
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it fine if boths are NoVN ? I mean can the caller assume it can apply some opts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect callers to screen these out. I guess I should add asserts here.

@AndyAyersMS
Copy link
Member Author

Diffs. TP is indeed down a bit from the other PR.

Looks like there's a set of novel test failures on arm/arm64.

@AndyAyersMS
Copy link
Member Author

The NAOT failure is looking suspiciously like another case of #88091. In System.Uri:CheckAuthorityHelper phi refinement allows RBO to jump thread both sides of a block that has a memory phi (w/o this change it just does one side); this converts a CSE def into a use. We set BBF_NO_CSE_IN on the block, but it has no effect as that block is now unreachable. I suspect this new CSE use is not valid but haven't gotten that far.

Need to try and extract this pattern as a repro case. Not sure why it is seemingly arm/naot only yet, maybe the CSE only happens there.

@AndyAyersMS
Copy link
Member Author

Repro case is something like the following

using System.Runtime.CompilerServices;

unsafe
{
    string example = "./";
    fixed (char* p = example)
    {
        int result = Problem(p, 0, example.Length);
        Console.WriteLine($"result={result}");
    }
}

[MethodImpl(MethodImplOptions.NoInlining)]
static void Expose(ref int x)
{ }

static unsafe int Problem(char* pString, int idx, int length)
{
    bool dotFound = false;
    int end = 0;
    Expose(ref end);
    for (end = idx; end < length; ++end)
    {
        if (dotFound && (pString[end] == '/' || pString[end] == '?' || pString[end] == '#'))
            break;
        else if (end < (idx + 2) && pString[end] == '.')
        {
            // allow one or two dots
            dotFound = true;
        }
        else
        {
            //failure
            return -1;
        }
    }

    return 0;
}

(if end is not exposed, other things block jump threading).

Here RBO can see dotFound is false if we've just entered the loop and also can now see it is true if we've come around the backedge. So the flow is now more like

    while (true)
    {
        if (end < (idx + 2) && pString[end] == '.')
        {
            //
        }
        else
        {
            return -1;
        }

        end++;

        if (end < length) break;

        if (pString[end] == '/' || pString[end] == '?' || pString[end] == '#')
        {
            break;
        }
    }

and CSE commons up all the pString[end] occurrences despite the increment of the index in between, since they all have the same VNs (in the initial loop order they indeed all have the same value, but we don't CSE there because the first reads are conditional).

One possible fix is to rework the BBF_NO_CSE_IN mechanism to be a flow edge annotation instead of a block annotation. When we jump thread past a memory PHI we could mark all the new pred edges this way and block CSE dataflow. But we currently don't have any flow edge annotations. And this may not be sufficiently durable, say if we can subsequently re-thread one of those new edges.

Another though is to try and somehow capture the dependence of the memory load VNs on the VN for end, but I'm not sure how to approach that.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jul 15, 2024

Though not particularly elegant, ccdca2a has fairly minimal diffs on its own. Hopefully this forestalls the need to do something more elaborate to avoid RBO tripping up CSE.

#104922 tests just that change in isolation.

@AndyAyersMS
Copy link
Member Author

Latest diffs

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@AndyAyersMS
Copy link
Member Author

There is a related stress failure with the assertion prop check.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@AndyAyersMS
Copy link
Member Author

Libraries stress failures look like #104153, #103459, and #102706.

runtime jitstress failure (nativeruntimeeventsource) not clear

@AndyAyersMS AndyAyersMS marked this pull request as ready for review July 17, 2024 14:31
@AndyAyersMS
Copy link
Member Author

@jakobbotsch PTAL
cc @dotnet/jit-contrib

@AndyAyersMS
Copy link
Member Author

TP cost is all in copyprop, I think we can get some/most of this back by reducing the number of candidates it considers.

@@ -6770,6 +6770,138 @@ const char* ValueNumStore::VNRelationString(VN_RELATION_KIND vrk)
}
#endif

bool ValueNumStore::IsVNPhiDef(ValueNum vn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unused

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll remove it soon, if I don't find some other use for it.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Perf] Regressions from cross-block local assertion prop
3 participants