Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Optimize Memmove unrolling for constant src #108576

Merged
merged 9 commits into from
Oct 8, 2024

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Oct 5, 2024

LowerCallMemmove already does a good job unrolling Memmove. Unfortunately, it leaves some opportunities on the table when the source can be a constant (esp, GPR constant), so we can skip a memory load. The reason is that LowerCallMemmove cannot rely on VN for that as it's too late.

void Test_utf16(Span<char> destination) => "True".CopyTo(destination);
void Test_utf8(Span<byte> destination) => "True"u8.CopyTo(destination);

Codegen diff:

; Method Benchmarks:Test_utf16(System.Span`1[ushort]):this (FullOpts)
       sub      rsp, 40
       mov      rax, bword ptr [rdx]
       mov      ecx, dword ptr [rdx+0x08]
       cmp      ecx, 4
       jl       SHORT G_M39687_IG04
-      mov      rcx, 0x21A00204FFC
-      mov      rdx, qword ptr [rcx]
-      mov      qword ptr [rax], rdx
+      mov      rcx, 0x65007500720054
+      mov      qword ptr [rax], rcx
       add      rsp, 40
       ret      
G_M39687_IG04:
       call     [System.ThrowHelper:ThrowArgumentException_DestinationTooShort()]
       int3     
-; Total bytes of code: 43
+; Total bytes of code: 40


; Method Benchmarks:Test_utf8(System.Span`1[ubyte]):this (FullOpts)
       sub      rsp, 40
       mov      rax, bword ptr [rdx]
       mov      ecx, dword ptr [rdx+0x08]
       cmp      ecx, 4
       jl       SHORT G_M38560_IG04
-      mov      rcx, 0x2C391CC5380      ; static handle
-      mov      edx, dword ptr [rcx]
-      mov      dword ptr [rax], edx
+      mov      dword ptr [rax], 0x65757254
       add      rsp, 40
       ret      
G_M38560_IG04:
       call     [System.ThrowHelper:ThrowArgumentException_DestinationTooShort()]
       int3     
-; Total bytes of code: 41
+; Total bytes of code: 33

Diffs

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 5, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@EgorBo

This comment was marked as resolved.

@EgorBo EgorBo marked this pull request as ready for review October 5, 2024 22:06
@EgorBo

This comment was marked as resolved.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 6, 2024

@MihuBot

@EgorBo
Copy link
Member Author

EgorBo commented Oct 6, 2024

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr pgo

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Co-authored-by: Jakob Botsch Nielsen <[email protected]>
@EgorBo
Copy link
Member Author

EgorBo commented Oct 7, 2024

PTAL @jakobbotsch @dotnet/jit-contrib
Diffs (678 improved, 370 regressed) there are size "regressions" from the unrollings as expected. Outerloop passed.

@EgorBo EgorBo merged commit 667c007 into dotnet:main Oct 8, 2024
106 of 108 checks passed
@EgorBo EgorBo deleted the opt-memmove-cns branch October 8, 2024 13:04
@xtqqczze
Copy link
Contributor

xtqqczze commented Oct 8, 2024

@MihuBot -arm64

@xtqqczze
Copy link
Contributor

xtqqczze commented Oct 8, 2024

@EgorBo Regressions on arm64?

Method Toolchain Mean Error Ratio
WriteBools Main 604.1 ns 0.33 ns 1.00
WriteBools PR 477.4 ns 0.03 ns 0.79

EgorBot/runtime-utils#110 (comment)

Method Toolchain Mean Error Ratio
WriteBools Main 604.1 ns 0.12 ns 1.00
WriteBools PR 531.0 ns 0.73 ns 0.88

EgorBot/runtime-utils#116 (comment)

@EgorBo
Copy link
Member Author

EgorBo commented Oct 8, 2024

@EgorBo Regressions on arm64?

load might be faster than up to 4 instructions to compose a 64bit constant for "FALS" or "TRUE" utf16 strings on arm64, but it's tricky in the real world, the load might be under contention. Anyway, we'll see once dotnet/performance run finishes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants