Added support to generate OpenMP parallel construct clauses, at this time for num_threads and proc_bind #2944

AlexandreEichenberger · 2024-09-17T18:20:56Z

Added support to generate OpenMP parallel construct with num_threads and proc_bind clause.

First I added two optional parameters to the krnl.parallel operation:

      %loop_block, %loop_local = krnl.block %0 32 : (!krnl.loop) -> (!krnl.loop, !krnl.loop)
      krnl.parallel(%loop_block), num_threads(%c8_i32) {proc_bind = "spread"} : !krnl.loop
      krnl.iterate(%loop_block) with (%0 -> %arg1 = 0 to 16384){
        %1 = krnl.get_induction_var_value(%loop_block) : (!krnl.loop) -> index
        %2 = vector.load %reshape[%1] : memref<16384xf32>, vector<32xf32>
        %3 = vector.load %reshape_2[%1] : memref<16384xf32>, vector<32xf32>
        %4 = arith.addf %2, %3 : vector<32xf32>
        vector.store %4, %reshape_4[%1] : memref<16384xf32>, vector<32xf32>
      }

which allows the user to associate parallel loops with an optional num_threads or proc_bind to the create.krnl.parallel builder.

When lowering to affine (or if generating affine or scf parallel operation), we then insert inside the loop a KrnlParallelClauseOp, which takes one mandatory value (the loop index), to identify the parallel loop targeted by the clause, and the optional num_threads (a value) and the proc_bind (a string).

  affine.parallel (%arg1) = (0) to (16384) step (32) {
    %0 = vector.load %reshape[%arg1] : memref<16384xf32>, vector<32xf32>
    %1 = vector.load %reshape_2[%arg1] : memref<16384xf32>, vector<32xf32>
    %2 = arith.addf %0, %1 : vector<32xf32>
    vector.store %2, %reshape_4[%arg1] : memref<16384xf32>, vector<32xf32>
    affine.for %arg2 = 0 to 1 {
    }
    krnl.parallel_clause(%arg1), num_threads(%c8_i32) {proc_bind = "spread"} : index
  }

After the parallel constructs are lowered to OpenMP construct, a simple pass (createProcessKrnlParallelClausePass) identify the KrnlParallelClauseOp, locate its enclosing omp.parallel construct, and migrate the clauses to the OpenMP constructs.

  omp.parallel num_threads(%c8_i32 : i32) proc_bind(spread) {
    omp.wsloop {
      omp.loop_nest (%arg1) : index = (%c0) to (%c16384) step (%c32) {
        memref.alloca_scope  {
          %0 = vector.load %reshape[%arg1] : memref<16384xf32>, vector<32xf32>
          %1 = vector.load %reshape_2[%arg1] : memref<16384xf32>, vector<32xf32>
          %2 = arith.addf %0, %1 : vector<32xf32>
          vector.store %2, %reshape_4[%arg1] : memref<16384xf32>, vector<32xf32>
        }
        omp.yield
      }
      omp.terminator
    }
    omp.terminator
  }

Added 2 mlir lit test files

Signed-off-by: Alexandre Eichenberger <[email protected]>

tungld

LGTM.

tungld · 2024-09-18T12:07:38Z

src/Conversion/KrnlToAffine/ConvertKrnlToAffine.cpp

+        needParallelClause = false;
+        // Currently approach: insert after yield and then move before it.
+        PatternRewriter::InsertionGuard insertGuard(builder);
+        builder.setInsertionPointAfter(yieldOp);


Doesn't setInsertionPoint(yieldOp) work for inserting just before yieldOp?

For some reasons, if I don't have the "moveBefore", it get's me this error

flt_orig_model.mlir:18:3: error: operand #0 does not dominate this use krnl.iterate(%loop_block) with (%0 -> %arg1 = 0 to 16384){ ^ flt_orig_model.mlir:18:3: note: see current operation: "krnl.parallel_clause"(%arg1, %0) {proc_bind = "spread"} : (index, i32) -> () flt_orig_model.mlir:18:3: note: operand defined as a block argument (block #0 in a child region)

Strangely, with the moveBefore(yieldOp), I get the same result with the setInsertionPointAfter or setInsertionPoint.
There is something fragile about the lowering of Krnl to Affine with respect to "movable".

Since it works as is, I prefer to leave it that way.

This conversion pass traverses the IR by ourselves. We manipulate graph directly. That might be the reason why it is fragile.

OK, then it's fine with the current way.

Signed-off-by: Alexandre Eichenberger <[email protected]>

chentong319 · 2024-09-18T19:07:08Z

src/Conversion/KrnlToAffine/ConvertKrnlToAffine.cpp

+        // Use clause only for the first one (expected the outermost one).
+        // Ideally, we would generate here a single, multi-dimensional
+        // AffineParallelOp, and we would not need to reset the flag.
+        needParallelClause = false;


Is this condition used afterwards?

Yes, when we need the parallel clause, then only the first loop iteration in the for(Value loopRef : loopRefs) will execute the addition of the KrnlParallelClauseOp

chentong319

LGTM!

jenkins-droid · 2024-09-19T13:54:10Z

Jenkins Linux s390x Build #15665 [push] Added support to generat... started at 09:54

jenkins-droid · 2024-09-19T13:54:13Z

Jenkins Linux amd64 Build #15662 [push] Added support to generat... started at 08:54

jenkins-droid · 2024-09-19T13:54:13Z

Jenkins Linux ppc64le Build #14692 [push] Added support to generat... started at 10:05

jenkins-droid · 2024-09-19T15:00:23Z

Jenkins Linux amd64 Build #15662 [push] Added support to generat... passed after 1 hr 6 min

jenkins-droid · 2024-09-19T15:33:58Z

Jenkins Linux s390x Build #15665 [push] Added support to generat... passed after 1 hr 39 min

jenkins-droid · 2024-09-19T15:57:39Z

Jenkins Linux ppc64le Build #14692 [push] Added support to generat... passed after 2 hr 3 min

AlexandreEichenberger added 4 commits September 17, 2024 13:58

Added support for omp clauses for num_threads and affinity

d5e20e3

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

31c3ab2

cleanup

2399a98

Signed-off-by: Alexandre Eichenberger <[email protected]>

format

94ba7a9

Signed-off-by: Alexandre Eichenberger <[email protected]>

AlexandreEichenberger requested a review from chentong319 September 17, 2024 18:22

update doc

e737bac

Signed-off-by: Alexandre Eichenberger <[email protected]>

AlexandreEichenberger requested a review from tungld September 17, 2024 19:56

added lib to include

75821e4

Signed-off-by: Alexandre Eichenberger <[email protected]>

tungld approved these changes Sep 18, 2024

View reviewed changes

AlexandreEichenberger added 2 commits September 18, 2024 08:16

update

88fba85

added warning

1e885ae

Signed-off-by: Alexandre Eichenberger <[email protected]>

chentong319 reviewed Sep 18, 2024

View reviewed changes

chentong319 approved these changes Sep 18, 2024

View reviewed changes

AlexandreEichenberger merged commit d03eff2 into onnx:main Sep 19, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support to generate OpenMP parallel construct clauses, at this time for num_threads and proc_bind #2944

Added support to generate OpenMP parallel construct clauses, at this time for num_threads and proc_bind #2944

AlexandreEichenberger commented Sep 17, 2024

tungld left a comment

tungld Sep 18, 2024

AlexandreEichenberger Sep 18, 2024

chentong319 Sep 18, 2024

tungld Sep 19, 2024

chentong319 Sep 18, 2024

AlexandreEichenberger Sep 18, 2024 •

edited

Loading

chentong319 left a comment

jenkins-droid commented Sep 19, 2024

jenkins-droid commented Sep 19, 2024

jenkins-droid commented Sep 19, 2024

jenkins-droid commented Sep 19, 2024

jenkins-droid commented Sep 19, 2024

jenkins-droid commented Sep 19, 2024

Added support to generate OpenMP parallel construct clauses, at this time for num_threads and proc_bind #2944

Added support to generate OpenMP parallel construct clauses, at this time for num_threads and proc_bind #2944

Conversation

AlexandreEichenberger commented Sep 17, 2024

tungld left a comment

Choose a reason for hiding this comment

tungld Sep 18, 2024

Choose a reason for hiding this comment

AlexandreEichenberger Sep 18, 2024

Choose a reason for hiding this comment

chentong319 Sep 18, 2024

Choose a reason for hiding this comment

tungld Sep 19, 2024

Choose a reason for hiding this comment

chentong319 Sep 18, 2024

Choose a reason for hiding this comment

AlexandreEichenberger Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

chentong319 left a comment

Choose a reason for hiding this comment

jenkins-droid commented Sep 19, 2024

jenkins-droid commented Sep 19, 2024

jenkins-droid commented Sep 19, 2024

jenkins-droid commented Sep 19, 2024

jenkins-droid commented Sep 19, 2024

jenkins-droid commented Sep 19, 2024

AlexandreEichenberger Sep 18, 2024 •

edited

Loading