-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Substitute zdnn calls for stick/unstick late, after most ZLow optimizations are performed #2812
Substitute zdnn calls for stick/unstick late, after most ZLow optimizations are performed #2812
Conversation
… pass Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
@tungld the lowering is done from zlow to krnl late, but it seems to work well. Let me know if you have reservations. |
Signed-off-by: Alexandre Eichenberger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Glad to hear we get performance improvement in the dynamic cases. I also prefer the lower-level generation of stick/unstick when most of optimizations have been applied.
@@ -467,6 +468,31 @@ class AddSubWithRHSZeroExpandPattern : public OpRewritePattern<OP_TYPE> { | |||
} | |||
}; | |||
|
|||
class RemoveReshapeWithIdentityPattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's clever to do it here. Thanks!
@@ -643,6 +671,15 @@ void getRewriteONNXForZHighDynamicallyLegal( | |||
return isSuitableForZDNN<ONNXConvOp>(op) || | |||
!canInferencePadsForNNPAConv(op); | |||
}); | |||
#if 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you will remove this in the final version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tx
using MDBuilder = MultiDialectBuilder<IndexExprBuilderForKrnl, KrnlBuilder, | ||
MathBuilder, MemRefBuilder, VectorBuilder, AffineBuilder, SCFBuilder>; | ||
|
||
/// Remove unstick if there is no use of its second operand except itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment does not seem to reflect the following pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
} | ||
}; | ||
|
||
/// Remove stick if there is no use of its second operand except itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment does not seem to reflect the following pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tx
Signed-off-by: Alexandre Eichenberger <[email protected]>
@jenkins-droid test this please |
1 similar comment
@jenkins-droid test this please |
Jenkins Linux ppc64le Build #13825 [push] Substitute zdnn calls fo... started at 12:51 |
Jenkins Linux amd64 Build #14795 [push] Substitute zdnn calls fo... started at 11:39 |
Jenkins Linux s390x Build #14800 [push] Substitute zdnn calls fo... started at 12:39 |
Jenkins Linux amd64 Build #14795 [push] Substitute zdnn calls fo... passed after 1 hr 5 min |
Jenkins Linux s390x Build #14800 [push] Substitute zdnn calls fo... passed after 1 hr 28 min |
Jenkins Linux ppc64le Build #13825 [push] Substitute zdnn calls fo... passed after 1 hr 56 min |
The many prior instances used buffered / tiled stick unstick code gen, but the redundant load / store ended up costing more than the simple approach of stick/unstick and store.
Also, the performance of dynamic shapes were quite lagging, because many key optimizations to remove unnecessary stick / unstick rely on ZLow stick/unstick patterns. Thus converting stick/unstick early (in the ZHigh to ZLow were preventing these opts.
So now I have a separate pass that changes the ZLow stick/unstick after zlow rewrite operations, and it now results in performance speedup.
Detailed results are available, the short story is that using CSU (compiler gen stick/unstick) Stick instead of zDNN Stick is 1.56x faster for Roberta (dyn shapes), and 6.71 x faster when using 6 threads.
For unstick, the speedup is 1.52 and 5.73x faster
Results for entire Roberta base with Batch size of 6 and 6 threads is a speedup of 1.04x in sequential mode, and 1.27x in parallel mode.
Note that this optimization still relies on having a LLVM with the
affine.prefetch
that accepts zAIU formats (aka div/mod affine expressions). The PR in LLVM was accepted and merged, it has to filter its way back into our LLVM.Code was verified by small corner examples and verifying the output of Roberta with/without it.