Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick Add a flag to turn on/off the lowering of scalar broadcasting binary ops to NNPA #2782

Merged
merged 1 commit into from
Apr 3, 2024

Conversation

cjvolzka
Copy link
Collaborator

@cjvolzka cjvolzka commented Apr 3, 2024

  • Add a flag to turn on/off scalar broadcasting binary op in NNPA

Signed-off-by: Tung D. Le [email protected]


Signed-off-by: Tung D. Le [email protected]
Co-authored-by: Alexandre Eichenberger [email protected]
(cherry picked from commit 08d4fed)

…ops to NNPA (onnx#2778)

* Add a flag to turn on/off scalar broadcasting binary op in NNPA

Signed-off-by: Tung D. Le <[email protected]>

---------

Signed-off-by: Tung D. Le <[email protected]>
Co-authored-by: Alexandre Eichenberger <[email protected]>
(cherry picked from commit 08d4fed)
@cjvolzka cjvolzka changed the title Cherry Add a flag to turn on/off the lowering of scalar broadcasting binary ops to NNPA Cherry-pick Add a flag to turn on/off the lowering of scalar broadcasting binary ops to NNPA Apr 3, 2024
Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I assume it is to cover a regression. Are there examples where this optimization is worth it? And others where it is not? It feel that it could use a performance model. Note that the DIV op does also work well on CPU with parallel (as there is non-negligible amount of work).

@@ -55,6 +55,13 @@ llvm::cl::opt<bool> nnpaEnableCompilerStickUnstick(
"stick/unstick code. Default is false."),
llvm::cl::init(false), llvm::cl::cat(OnnxMlirOptions));

llvm::cl::opt<bool> nnpaEnableScalarBcastBinary(
"nnpa-enable-scalar-bcast-binary",
llvm::cl::desc("Enable the lowering to NNPA the broadcasting binary ops "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest a clearer wording:

Enable the lowering to NNPA of binary operations with broadcasting of a scalar operand. Currently only enable ONNXDiv. Default is false.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it is to cover a regression.

Yes. This was a cherry pick of #2778 into our upcoming 0.4.2.0 release. It resolves the performance regression for roberta-sequence-classification-9 from #2769.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there examples where this optimization is worth it? And others where it is not? It feel that it could use a performance model.

I was effective to roberta-base, but later a rule to propagate Div to the inputs of MatMul can remove Div.
I don't have a model to which this feature is effective now, but in general, we want to offload a broadcasting div to NNPA only if it is surrounded by NNPA ops.

I reworded the explanation in PR #2780.

@cjvolzka cjvolzka merged commit 35a61d3 into onnx:0.4.2.0 Apr 3, 2024
8 checks passed
@cjvolzka cjvolzka deleted the 4.2-flag-cherry-pick branch April 3, 2024 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants