-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support to generate OpenMP parallel construct clauses, at this time for num_threads and proc_bind #2944
Added support to generate OpenMP parallel construct clauses, at this time for num_threads and proc_bind #2944
Conversation
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
needParallelClause = false; | ||
// Currently approach: insert after yield and then move before it. | ||
PatternRewriter::InsertionGuard insertGuard(builder); | ||
builder.setInsertionPointAfter(yieldOp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't setInsertionPoint(yieldOp)
work for inserting just before yieldOp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reasons, if I don't have the "moveBefore", it get's me this error
flt_orig_model.mlir:18:3: error: operand #0 does not dominate this use
krnl.iterate(%loop_block) with (%0 -> %arg1 = 0 to 16384){
^
flt_orig_model.mlir:18:3: note: see current operation: "krnl.parallel_clause"(%arg1, %0) {proc_bind = "spread"} : (index, i32) -> ()
flt_orig_model.mlir:18:3: note: operand defined as a block argument (block #0 in a child region)
Strangely, with the moveBefore(yieldOp)
, I get the same result with the setInsertionPointAfter
or setInsertionPoint
.
There is something fragile about the lowering of Krnl to Affine with respect to "movable".
Since it works as is, I prefer to leave it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conversion pass traverses the IR by ourselves. We manipulate graph directly. That might be the reason why it is fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then it's fine with the current way.
Signed-off-by: Alexandre Eichenberger <[email protected]>
// Use clause only for the first one (expected the outermost one). | ||
// Ideally, we would generate here a single, multi-dimensional | ||
// AffineParallelOp, and we would not need to reset the flag. | ||
needParallelClause = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this condition used afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, when we need the parallel clause, then only the first loop iteration in the for(Value loopRef : loopRefs)
will execute the addition of the KrnlParallelClauseOp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Jenkins Linux s390x Build #15665 [push] Added support to generat... started at 09:54 |
Jenkins Linux amd64 Build #15662 [push] Added support to generat... started at 08:54 |
Jenkins Linux ppc64le Build #14692 [push] Added support to generat... started at 10:05 |
Jenkins Linux amd64 Build #15662 [push] Added support to generat... passed after 1 hr 6 min |
Jenkins Linux s390x Build #15665 [push] Added support to generat... passed after 1 hr 39 min |
Jenkins Linux ppc64le Build #14692 [push] Added support to generat... passed after 2 hr 3 min |
Added support to generate OpenMP parallel construct with
num_threads
andproc_bind
clause.First I added two optional parameters to the
krnl.parallel
operation:which allows the user to associate parallel loops with an optional
num_threads
orproc_bind
to thecreate.krnl.parallel
builder.When lowering to affine (or if generating affine or scf parallel operation), we then insert inside the loop a
KrnlParallelClauseOp
, which takes one mandatory value (the loop index), to identify the parallel loop targeted by the clause, and the optionalnum_threads
(a value) and theproc_bind
(a string).After the parallel constructs are lowered to OpenMP construct, a simple pass (
createProcessKrnlParallelClausePass
) identify theKrnlParallelClauseOp
, locate its enclosingomp.parallel
construct, and migrate the clauses to the OpenMP constructs.Added 2 mlir lit test files