Cherry-pick Add a flag to turn on/off the lowering of scalar broadcasting binary ops to NNPA #2782

cjvolzka · 2024-04-03T12:44:56Z

Add a flag to turn on/off scalar broadcasting binary op in NNPA

Signed-off-by: Tung D. Le [email protected]

Signed-off-by: Tung D. Le [email protected]
Co-authored-by: Alexandre Eichenberger [email protected]
(cherry picked from commit 08d4fed)

…ops to NNPA (onnx#2778) * Add a flag to turn on/off scalar broadcasting binary op in NNPA Signed-off-by: Tung D. Le <[email protected]> --------- Signed-off-by: Tung D. Le <[email protected]> Co-authored-by: Alexandre Eichenberger <[email protected]> (cherry picked from commit 08d4fed)

AlexandreEichenberger

LGTM.

I assume it is to cover a regression. Are there examples where this optimization is worth it? And others where it is not? It feel that it could use a performance model. Note that the DIV op does also work well on CPU with parallel (as there is non-negligible amount of work).

AlexandreEichenberger · 2024-04-03T15:32:26Z

src/Accelerators/NNPA/Compiler/NNPACompilerOptions.cpp

@@ -55,6 +55,13 @@ llvm::cl::opt<bool> nnpaEnableCompilerStickUnstick(
                   "stick/unstick code. Default is false."),
    llvm::cl::init(false), llvm::cl::cat(OnnxMlirOptions));

+llvm::cl::opt<bool> nnpaEnableScalarBcastBinary(
+    "nnpa-enable-scalar-bcast-binary",
+    llvm::cl::desc("Enable the lowering to NNPA the broadcasting binary ops "


I would suggest a clearer wording:

Enable the lowering to NNPA of binary operations with broadcasting of a scalar operand. Currently only enable ONNXDiv. Default is false.

I assume it is to cover a regression.

Yes. This was a cherry pick of #2778 into our upcoming 0.4.2.0 release. It resolves the performance regression for roberta-sequence-classification-9 from #2769.

Are there examples where this optimization is worth it? And others where it is not? It feel that it could use a performance model.

I was effective to roberta-base, but later a rule to propagate Div to the inputs of MatMul can remove Div.
I don't have a model to which this feature is effective now, but in general, we want to offload a broadcasting div to NNPA only if it is surrounded by NNPA ops.

I reworded the explanation in PR #2780.

cjvolzka requested review from AlexandreEichenberger and tungld April 3, 2024 12:45

cjvolzka changed the title ~~Cherry Add a flag to turn on/off the lowering of scalar broadcasting binary ops to NNPA~~ Cherry-pick Add a flag to turn on/off the lowering of scalar broadcasting binary ops to NNPA Apr 3, 2024

AlexandreEichenberger approved these changes Apr 3, 2024

View reviewed changes

cjvolzka merged commit 35a61d3 into onnx:0.4.2.0 Apr 3, 2024
8 checks passed

cjvolzka deleted the 4.2-flag-cherry-pick branch April 3, 2024 17:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-pick Add a flag to turn on/off the lowering of scalar broadcasting binary ops to NNPA #2782

Cherry-pick Add a flag to turn on/off the lowering of scalar broadcasting binary ops to NNPA #2782

cjvolzka commented Apr 3, 2024

AlexandreEichenberger left a comment

AlexandreEichenberger Apr 3, 2024

cjvolzka Apr 3, 2024

tungld Apr 4, 2024

Cherry-pick Add a flag to turn on/off the lowering of scalar broadcasting binary ops to NNPA #2782

Cherry-pick Add a flag to turn on/off the lowering of scalar broadcasting binary ops to NNPA #2782

Conversation

cjvolzka commented Apr 3, 2024

AlexandreEichenberger left a comment

Choose a reason for hiding this comment

AlexandreEichenberger Apr 3, 2024

Choose a reason for hiding this comment

cjvolzka Apr 3, 2024

Choose a reason for hiding this comment

tungld Apr 4, 2024

Choose a reason for hiding this comment