-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing the location of DimAnalysis in onnx-to-zhigh pass and some rules in zhigh-to-onnx pass #2794
Conversation
Signed-off-by: Tung D. Le <[email protected]>
Signed-off-by: Tung D. Le <[email protected]>
Signed-off-by: Tung D. Le <[email protected]>
Signed-off-by: Tung D. Le <[email protected]>
Signed-off-by: Tung D. Le <[email protected]>
Do you account for constants? Meaning do we do something different if the non-stickified input is a constant or not? As stickily of constants is "free" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, maybe double check that the DIV performs better with this PR than before.
// (ONNXDivOp $x, $y), | ||
// [(NotBlockArgument:$x), (HasOneUse:$s_x), (HasOneUse:$s_y)] | ||
// >; | ||
//def replaceZHighDivPattern1 : Pat< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check this one? DIV on CPU is quite a bit slower in general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, since DIV on CPU is quite a bit slower in genera, I disabled this rule to keep Div on NNPA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extensively tested the commit from yesterday. (200 iternationsw without output variations, not performance regression in about a dozen models tested).
I spot tested the lastest commit and confirmed after a few iterations roberta output is still stable.
I'm approving from my end but @AlexandreEichenberger or someone else should still review from a code perspective.
Jenkins Linux s390x Build #14696 [push] Fixing the location of D... started at 09:30 |
Jenkins Linux ppc64le Build #13691 [push] Fixing the location of D... started at 09:40 |
Jenkins Linux amd64 Build #14666 [push] Fixing the location of D... started at 08:30 |
Jenkins Linux amd64 Build #14666 [push] Fixing the location of D... passed after 1 hr 17 min |
Jenkins Linux ppc64le Build #13691 [push] Fixing the location of D... aborted after 1 hr 42 min |
Jenkins Linux s390x Build #14696 [push] Fixing the location of D... aborted after 1 hr 42 min |
The onnx-to-zhigh pass has two phases: 1) converting multiple onnx ops into a single zhigh op, and 2) converting a single onnx op to a single zhigh op, where the second phase uses DimAnalysis (Patterns in the 1st phase at this moment does not use DimAnalysis)
The problem is DimAnalysis is currently called before the 1st phase, which is not good because the 1st phase may change the IR so the information from DimAnalysis is obsoleted to the 2nd phase. Correct position for DimAnalysis would be just before the 2nd phase.
Other than that, this PR changes slightly the rules in zhigh-to-onnx pass so that for binary ops, only one input (instead of two) that is from stick would be enough to trigger the rule to convert a zhigh op back to an onnx op.
Resolves #2789