Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On x86_64, JIT could reorder numeric operations to use the flag for subsequent conditional branch but does not do so #109042

Open
neon-sunset opened this issue Oct 19, 2024 · 1 comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner

Comments

@neon-sunset
Copy link
Contributor

neon-sunset commented Oct 19, 2024

Description

Given simple program

static unsafe void Iterate(int* nums, nuint cnt) {
    var sum = 0;
    var iter = new PtrIter<int>(nums, cnt);

    while (iter.Next(out var n)) {
        sum += n;
    }

    Console.WriteLine(sum);
}

unsafe struct PtrIter<T>(T* ptr, nuint count)
where T: unmanaged {
    public bool Next(out T item) {
        if (count != 0) {
            item = *ptr;
            ptr++;
            count--;
            return true;
        }
        item = default;
        return false;
    }
}

Iterate compiles to

G_M000_IG01:                ;; offset=0x0000
       sub      rsp, 40
G_M000_IG02:                ;; offset=0x0004
       xor      eax, eax
       test     rdx, rdx
       je       SHORT G_M000_IG04
       align    [0 bytes for IG03]
G_M000_IG03:                ;; offset=0x000B
       mov      r8d, dword ptr [rcx]
       add      rcx, 4
       dec      rdx
       add      eax, r8d
       test     rdx, rdx ;; <-- if we reorder dec and add, this test becomes redundant as j.cc can simply consume the flag
       jne      SHORT G_M000_IG03
G_M000_IG04:                ;; offset=0x001D
       mov      ecx, eax
       call     [System.Console:WriteLine(int)]
       nop      
G_M000_IG05:                ;; offset=0x0026
       add      rsp, 40
       ret

which is quite a bit worse than doing similar with a plain array foreach:

G_M000_IG02:                ;; offset=0x0000
       xor      eax, eax
       mov      edx, dword ptr [rcx+0x08]
       test     edx, edx
       jle      SHORT G_M000_IG05
G_M000_IG03:                ;; offset=0x0009
       add      rcx, 16
       align    [0 bytes for IG04]
G_M000_IG04:                ;; offset=0x000D
       add      eax, dword ptr [rcx]
       add      rcx, 4
       dec      edx
       jne      SHORT G_M000_IG04
G_M000_IG05:                ;; offset=0x0017
       mov      ecx, eax
G_M000_IG06:                ;; offset=0x0019
       tail.jmp [System.Console:WriteLine(int)]

Analysis

The test could be elided if JIT gains the ability to perform a peephole which reorders numeric operations where there are potential consumers for the flags that they set.

Another minor note is a missed opportunity to merge mov and add.

I have also noticed that merging pointer dereference and post-increment into *ptr++ leads to worse codegen overall (breaking otherwise perfect output for ARM64), even though it shouldn't.

Configuration

.NET SDK:
 Version:           9.0.100-rtm.24512.1
 Commit:            5b9d9d4677
 Workload version:  9.0.100-manifests.87287131
 MSBuild version:   17.12.3+4ae11fa8e

Regression?

No

@neon-sunset neon-sunset added the tenet-performance Performance related issue label Oct 19, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 19, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Oct 19, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

1 participant