JIT: Add a disabled-by-default implementation of strength reduction #104243

jakobbotsch · 2024-07-01T13:35:31Z

This adds a disabled-by-default implementation of strength reduction. At this point the implementation should be correct, however it is currently both a size and perfscore regression when it is enabled. More work will be needed to get the heuristics right and to make it kick in for more cases.

Strength reduction replaces "expensive" operations computed on every loop iteration with cheaper ones by creating more induction variables. In C# terms it effectively transforms something like

private struct S
{
    public int A, B, C;
}

[MethodImpl(MethodImplOptions.NoInlining)]
private static float Sum(S[] ss)
{
    int sum = 0;
    for (int i = 0; i < ss.Length; i++)
    {
        S v = ss[i];
        sum += v.A;
        sum += v.B;
        sum += v.C;
    }

    return sum;
}

into an equivalent

int sum = 0;
ref S curS = ref ss[0];
for (int i = 0; i < ss.Length; i++)
{
  sum += curS.A;
  sum += curS.B;
  sum += curS.C;
  curS = ref Unsafe.Add(ref curS, 1);
}

With strength reduction enabled this PR thus changes codegen of the standard foreach version above from

G_M63518_IG03:  ;; offset=0x0011
       lea      r10, [rdx+2*rdx]
       lea      r10, bword ptr [rcx+4*r10+0x10]
       mov      r9d, dword ptr [r10]
       mov      r11d, dword ptr [r10+0x04]
       mov      r10d, dword ptr [r10+0x08]
       add      eax, r9d
       add      eax, r11d
       add      eax, r10d
       inc      edx
       cmp      r8d, edx
       jg       SHORT G_M63518_IG03
						;; size=36 bbWeight=4 PerfScore 39.00

to

G_M63518_IG03:  ;; offset=0x000D
       add      rcx, 16
						;; size=4 bbWeight=0.25 PerfScore 0.06

G_M63518_IG04:  ;; offset=0x0011
       mov      r8, rcx
       mov      r10d, dword ptr [r8]
       mov      r9d, dword ptr [r8+0x04]
       mov      r8d, dword ptr [r8+0x08]
       add      eax, r10d
       add      eax, r9d
       add      eax, r8d
       add      rcx, 12
       dec      edx
       jne      SHORT G_M63518_IG04
						;; size=31 bbWeight=4 PerfScore 34.00

on x64. Further improvements can be made to enable better address modes.

The current heuristics try to ensure that we do not actually end up with more primary induction variables. The strength reduction only kicks in when it thinks that all uses of the primary IV can be replaced by the new primary IV. However, uses inside loop exit tests are allowed to stay unreplaced by the assumption that the downwards loop transformation will be able to get rid of them.

Getting the cases around overflow right turned out to be hard and required reasoning about trip counts that was added in a previous PR. Generally, the issue is that we need to prove that transforming a zero extension of an add recurrence to a 64-bit add recurrence is legal. For example, for a simple case of

for (int i = 0; i < arr.Length; i++)
  sum += arr[i];

the IV analysis is eventually going to end up wanting to show that zext<64>(int32 <L, 0, 1>) => int64 <L, 0, 1> is a correct transformation. This requires showing that the add recurrence does not step past 2^32-1, which requires the bound on the trip count that we can now compute. The reasoning done for both the trip count and around the overflow is still very limited but can be improved incrementally.

The implementation works by considering every primary IV of the loop in turn, and by initializing 'cursors' pointing to each use of the primary IV. It then tries to repeatedly advance these cursors to the parent of the uses while it results in a new set of cursors that still compute the same (now derived) IV. If it manages to do this once, then replacing the cursors by a new primary IV should result in the old primary IV no longer being necessary, while having replaced some operations by cheaper ones.

This adds a disabled-by-default implementation of strength reduction. At this point the implementation should be correct, however it is currently both a size and perfscore regression when it is enabled. More work will be needed to get the heuristics right and to make it kick in for more cases. Strength reduction replaces "expensive" operations computed on every loop iteration with cheaper ones by creating more induction variables. In C# terms it effectively transforms something like ``` private struct S { public int A, B, C; } [MethodImpl(MethodImplOptions.NoInlining)] private static float Sum(S[] ss) { int sum = 0; foreach (S v in ss) { sum += v.A; sum += v.B; sum += v.C; } return sum; } ``` into an equivalent ``` int sum = 0; ref S curS = ref ss[0]; for (int i = 0; i < ss.Length; i++) { sum += curS.A; sum += curS.B; sum += curS.C; curS = ref Unsafe.Add(ref curS, 1); } ``` With strength reduction enabled this PR thus changes codegen of the standard `foreach` version above from ```asm G_M63518_IG03: ;; offset=0x0011 lea r10, [rdx+2*rdx] lea r10, bword ptr [rcx+4*r10+0x10] mov r9d, dword ptr [r10] mov r11d, dword ptr [r10+0x04] mov r10d, dword ptr [r10+0x08] add eax, r9d add eax, r11d add eax, r10d inc edx cmp r8d, edx jg SHORT G_M63518_IG03 ;; size=36 bbWeight=4 PerfScore 39.00 ``` to ```asm G_M63518_IG04: ;; offset=0x0011 mov r8, rcx mov r10d, dword ptr [r8] mov r9d, dword ptr [r8+0x04] mov r8d, dword ptr [r8+0x08] add eax, r10d add eax, r9d add eax, r8d add rcx, 12 dec edx jne SHORT G_M63518_IG04 ;; size=31 bbWeight=4 PerfScore 34.00 ``` on x64. Further improvements can be made to enable better address modes. The current heuristics try to ensure that we do not actually end up with more primary induction variables. The strength reduction only kicks in when it thinks that all uses of the primary IV can be replaced by the new primary IV. However, uses inside loop exit tests are allowed to stay unreplaced by the assumption that the downwards loop transformation will be able to get rid of them. Getting the cases around overflow right turned out to be hard and required reasoning about trip counts that was added in a previous PR. Generally, the issue is that we need to prove that transforming a zero extension of an add recurrence to a 64-bit add recurrence is legal. For example, for a simple case of ``` for (int i = 0; i < arr.Length; i++) sum += arr[i]; ``` the IV analysis is eventually going to end up wanting to show that `zext<64>(int32 <L, 0, 1>) => int64 <L, 0, 1>` is a correct transformation. This requires showing that the add recurrence does not step past 2^32-1, which requires the bound on the trip count that we can now compute. The reasoning done for both the trip count and around the overflow is still very limited but can be improved incrementally. The implementation works by considering every primary IV of the loop in turn, and by initializing 'cursors' pointing to each use of the primary IV. It then tries to repeatedly advance these cursors to the parent of the uses while it results in a new set of cursors that still compute the same (now derived) IV. If it manages to do this once, then replacing the cursors by a new primary IV should result in the old primary IV no longer being necessary, while having replaced some operations by cheaper ones.

dotnet-policy-service · 2024-07-01T13:39:57Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

jakobbotsch · 2024-07-01T23:29:41Z

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

azure-pipelines · 2024-07-01T23:29:53Z

Azure Pipelines successfully started running 2 pipeline(s).

jakobbotsch · 2024-07-02T09:41:08Z

@MichalStrehovsky Have you seen this one before during ilc? https://dev.azure.com/dnceng-public/public/_build/results?buildId=726752&view=logs&j=15d34cf8-8614-5f8a-e364-3c281b6c41dd&t=2f724b17-e574-5ab4-a401-210ce585d93f&l=767

Do I understand correctly this is during the ilc that was jitted without my changes from this PR?

jakobbotsch · 2024-07-02T11:36:09Z

Going to bounce the stress pipelines which seem to have been in bad shape with e.g. #104263 and #104232.

jakobbotsch · 2024-07-02T13:17:51Z

Still have some correctness issues to sort out.. for example, we create an illegal byref in Test in the following example:

public static void Main()
{
    Test(ref Unsafe.NullRef<ushort>(), () => false);
}

[MethodImpl(MethodImplOptions.NoInlining)]
public static int Test(ref ushort b, Func<bool> sumIndex)
{
    int sum = 0;
    for (nuint i = 0; i < 10000; i++)
    {
        if (sumIndex())
            sum += Unsafe.Add(ref b, 0x100000 + i);
    }

    return sum;
}

G_M401_IG02:  ;; offset=0x000B
       xor      esi, esi
       lea      rdi, bword ptr [rcx+0x00200000]
       mov      ebp, 0x2710
						;; size=14 bbWeight=0.25 PerfScore 0.25

G_M401_IG03:  ;; offset=0x0019
       mov      rcx, gword ptr [rbx+0x08]
       call     [rbx+0x18]System.Func`1[ubyte]:Invoke():ubyte:this
       test     eax, eax
       je       SHORT G_M401_IG05
						;; size=11 bbWeight=4 PerfScore 25.00

G_M401_IG04:  ;; offset=0x0024
       movzx    rax, word  ptr [rdi]
       add      esi, eax
						;; size=5 bbWeight=2 PerfScore 4.50

G_M401_IG05:  ;; offset=0x0029
       add      rdi, 2
       dec      rbp
       jne      SHORT G_M401_IG03
						;; size=9 bbWeight=4 PerfScore 6.00

We end up creating and reporting a byref 0x00200000 eagerly that is never created in normal circumstances.

jakobbotsch · 2024-07-02T14:31:31Z

Just going to bail indiscriminately on the GC pointers case for now. Will handle some of the cases where we can prove it is ok in a follow-up.

…ion-2

jakobbotsch · 2024-07-02T21:00:50Z

cc @dotnet/jit-contrib PTAL @AndyAyersMS

The jitstress/libraries-jitstress failures look like #103577, #104316, #104317, #102706.

Diffs with this enabled do not look too encouraging yet. They looked quite a bit better before disabling the GC types, so hopefully will be able to get it into a better state once I start improving on the legality checks and heuristics. I want to do that in some smaller changes separately. Am going to push a commit to disable this by default again.

AndyAyersMS

Overall this looks great. I had few questions but they can be addressed (or ignored) in follow-ups.

src/coreclr/jit/scev.cpp

AndyAyersMS · 2024-07-03T14:47:50Z

src/coreclr/jit/inductionvariableopts.cpp

+            return false;
+        }
+
+        bool isInsideExitTest =


Maybe we should compute the set of exit blocks during loop analysis?

Or at least introduce some helper for this... let me do it as part of a follow-up.

AndyAyersMS · 2024-07-03T14:50:20Z

src/coreclr/jit/inductionvariableopts.cpp

+//
+//   It is not a correctness requirement that we remove all uses; if we end up
+//   not doing so (e.g. because a cursor was not created by this function),
+//   then we may just end up with extra primary IVs in the loop.


Would it be interesting to have a stress mode where you intentionally don't create cursors all the places you could?

Yeah, I think so. Good idea for a follow-up.

AndyAyersMS · 2024-07-03T15:14:06Z

src/coreclr/jit/inductionvariableopts.cpp

+    // The algorithm here works in the following way: we process each primary
+    // IV in turn. For every primary IV, we create a 'cursor' pointing to every
+    // use of that primary IV. We then continuously advance each cursor to the
+    // parent node as long as all cursors represent the same derived IV. Once we


I'm curious about the "same" derived IV requirement ... if we are say converting from a char array to an int array

int[] a = ... char[] b = ... for (int i = 0; ...) { a[i] = (int) b[i]; }

if I understand correctly the current code would not strength reduce because we'd have <L, 0, 4> vs <L, 0, 2>? Seems like we still might want to strength reduce as long as the set of cursors doesn't expand into too many different new IVs? Or say if it is relatively cheap to convert from one family to another (say multiply by 2 which can be done with scaled address mode).

Also wonder if this profitability equation changes if we have post-increment modes (I forget where we ended up on this).

if I understand correctly the current code would not strength reduce because we'd have <L, 0, 4> vs <L, 0, 2>? Seems like we still might want to strength reduce as long as the set of cursors doesn't expand into too many different new IVs? Or say if it is relatively cheap to convert from one family to another (say multiply by 2 which can be done with scaled address mode).

Yeah, that's right. I think you're right we can generalize this: we can keep going if the addrecs have the same 'start' and all step values are divisible by the minimum step value. I'll consider that as a follow-up.

Also wonder if this profitability equation changes if we have post-increment modes (I forget where we ended up on this).

Currently there's really no heuristics here; we kind of just do strength reduction whenever we have a candidate. I definitely need to sit down and consider more carefully how the transformation affects performance and size (usually it seems to regress size because it requires computing some 'start' value in the preheader).

AndyAyersMS · 2024-07-03T15:19:25Z

src/coreclr/jit/inductionvariableopts.cpp

+                break;
+            }
+
+            // TODO-CQ: If this is now the source to a store, we can


Do you know how often you see this? I suppose for normal array uses it's rare, we may only see it for pointer math where users have CSE'd parts of the computation or something.

Fairly often:

Advanced cursor node type ADD : 26343 CAST : 15729 LSH : 11239 BOUNDS_CHECK : 8355 IND : 5714 LT : 3143 GT : 2536 STORE_LCL_VAR : 1319 GE : 1252 SUB : 1008 CALL : 993 STORE_BLK : 907 HWINTRINSIC : 892 BLK : 841 UMOD : 822 LE : 437 STOREIND : 357 EQ : 297 MUL : 89 AND : 63 NE : 61 MOD : 38 INTRINSIC : 16 NEG : 14 UDIV : 10 COMMA : 7 BITCAST : 2 DIV : 2 RSH : 2

jakobbotsch · 2024-07-04T08:46:32Z

There's a few diffs from some of the new sanity checks added when creating SCEVs.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 1, 2024

dotnet-policy-service bot assigned jakobbotsch Jul 1, 2024

build-analysis bot mentioned this pull request Jul 1, 2024

[8.0] [wasm] [AOT] HTTP Error 500.30 - ASP.NET Core app failed to start #95160

Open

jakobbotsch closed this Jul 1, 2024

jakobbotsch reopened this Jul 1, 2024

Fix a GC hole

efb641c

build-analysis bot mentioned this pull request Jul 1, 2024

Failed test: JIT/SIMD/LdfldGeneric_ro/LdfldGeneric_ro (arm64) #104263

Closed

This comment was marked as spam.

Sign in to view

This comment was marked as resolved.

Sign in to view

Disallow GC typed add recs

3a1850a

This comment was marked as resolved.

Sign in to view

Merge branch 'main' of github.com:dotnet/runtime into strength-reduct…

7c0e2f8

…ion-2

This comment was marked as resolved.

Sign in to view

Disable strength reduction

3d66247

jakobbotsch marked this pull request as ready for review July 2, 2024 20:55

jakobbotsch requested a review from AndyAyersMS July 2, 2024 21:00

This was referenced Jul 3, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

The job running on agent NetCore-Public ran longer than the maximum time #104044

Closed

AndyAyersMS approved these changes Jul 3, 2024

View reviewed changes

Fix some comments

c06728d

jakobbotsch merged commit a8616d9 into dotnet:main Jul 4, 2024
102 of 107 checks passed

jakobbotsch deleted the strength-reduction-2 branch July 4, 2024 08:46

This was referenced Jul 5, 2024

Improve JIT loop optimizations (.NET 9) #93144

Closed

JIT: Recognize more addressing modes #104538

Open

jakobbotsch mentioned this pull request Jul 22, 2024

JIT: support strength reducing multiple IVs with different step values provided they have a gcd > 1 #105241

Open

github-actions bot locked and limited conversation to collaborators Aug 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Add a disabled-by-default implementation of strength reduction #104243

JIT: Add a disabled-by-default implementation of strength reduction #104243

jakobbotsch commented Jul 1, 2024 •

edited

Loading

dotnet-policy-service bot commented Jul 1, 2024

jakobbotsch commented Jul 1, 2024

azure-pipelines bot commented Jul 1, 2024

jakobbotsch commented Jul 2, 2024

jakobbotsch commented Jul 2, 2024 •

edited

Loading

This comment was marked as spam.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

jakobbotsch commented Jul 2, 2024

jakobbotsch commented Jul 2, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

jakobbotsch commented Jul 2, 2024

AndyAyersMS left a comment

AndyAyersMS Jul 3, 2024

jakobbotsch Jul 3, 2024

AndyAyersMS Jul 3, 2024

jakobbotsch Jul 3, 2024

AndyAyersMS Jul 3, 2024

jakobbotsch Jul 3, 2024

AndyAyersMS Jul 3, 2024

jakobbotsch Jul 4, 2024

jakobbotsch commented Jul 4, 2024

JIT: Add a disabled-by-default implementation of strength reduction #104243

JIT: Add a disabled-by-default implementation of strength reduction #104243

Conversation

jakobbotsch commented Jul 1, 2024 • edited Loading

dotnet-policy-service bot commented Jul 1, 2024

jakobbotsch commented Jul 1, 2024

azure-pipelines bot commented Jul 1, 2024

jakobbotsch commented Jul 2, 2024

jakobbotsch commented Jul 2, 2024 • edited Loading

This comment was marked as spam.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

jakobbotsch commented Jul 2, 2024

jakobbotsch commented Jul 2, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

jakobbotsch commented Jul 2, 2024

AndyAyersMS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakobbotsch commented Jul 4, 2024

jakobbotsch commented Jul 1, 2024 •

edited

Loading

jakobbotsch commented Jul 2, 2024 •

edited

Loading