A rewrite pattern to optimize constant scaling in self-attention layer #2640

tungld · 2023-11-24T07:18:46Z

In the self-attention layer, the output of MatMul is scaled by a constant factor via a division/multiplication operation. This patch rewrites the division/multiplication operation so that the constant input of MatMul will be scaled instead of its output. Thus, the scaling of the constant inputs can be folded at compile time.

For example, this patch rewrites the following pattern:

shape_transform(X1 * A1 + B1) * shape_transform(X2 * A2 + B2) / k

into

shape_transform(X1 * A1 + B1) * shape_transform(X2 * A2/k + B2/k)

if A2, B2 and k are constants,

or into

shape_transform(X1 * A1/k + B1/k) * shape_transform(X2 * A2 + B2)

if A1, B1 and k are constants,

where

* is matrix multiplication; + and / are element-wise addition and division
A1, A2, B1, B2, and k are constants so that A1/k, B1/k, A2/k and B2/k can
be folded. k is a scalar constant so that it's broadcastable to all A1, A2,
B1, B2.
shape_transform includes a sequence of operations that change the data
shape of the input but not numerical values, for example: Reshape,
Transpose, etc.

Signed-off-by: Tung D. Le <[email protected]>

tehbone · 2023-11-24T21:32:54Z

Can this be abstracted to any pair wise operation? There could be some attention models that have a multiplication by an inverse sqrt.

tungld · 2023-11-24T23:24:26Z

Can this be abstracted to any pair wise operation? There could be some attention models that have a multiplication by an inverse sqrt.

Good suggestion! Yes, it is quite straightforward to support multiplication also (I don’t think it is applicable to addition and subtraction). Will add that soon. Thanks!

Signed-off-by: Tung D. Le <[email protected]>

AlexandreEichenberger

LGTM

AlexandreEichenberger · 2023-11-27T15:02:22Z

src/Dialect/ONNX/Rewrite.cpp

+    Operation *lhsSubMatOp, *lhsAddOp;
+    bool matchLHS = matchShapeAddMatMul(lhs, A1, B1, lhsSubMatOp, lhsAddOp);
+
+    // Match rhs = shape_transform(X2*A2 + B2)


Nit, you could test match RHS only when matchLHS fails. Less testing, same results

bool matchRHS = !matchRHS && matchShapeAddMatMul(rhs, A2, B2, rhsSubMatOp, rhsAddOp);

Then you can get rid of the case where both matches, as this will never be the case

I see. Updated the code to check matchRHS only when matchLHS failed. Thanks!

AlexandreEichenberger · 2023-11-27T15:05:17Z

src/Dialect/ONNX/Rewrite.cpp

@@ -209,6 +209,61 @@ bool haveSameStaticShape(Value lhs, Value rhs) {
  return hasStaticShape(lhsT) && (getShape(lhsT) == getShape(rhsT));
 }

+// Match v = shape_transform(X*A + B).


Do we cover the case where instead of X*A+B we have a Gemm op?

Good idea. I will add Gemm.

I added the case for Gemm.

gongsu832 · 2023-11-27T18:10:06Z

@jenkins-droid test this please

Signed-off-by: Tung D. Le <[email protected]>

jenkins-droid · 2023-11-28T07:01:13Z

Jenkins Linux s390x Build #13552 [push] A rewrite pattern to opt... started at 02:01

jenkins-droid · 2023-11-28T07:01:15Z

Jenkins Linux ppc64le Build #12549 [push] A rewrite pattern to opt... started at 02:07

jenkins-droid · 2023-11-28T07:01:18Z

Jenkins Linux amd64 Build #13525 [push] A rewrite pattern to opt... started at 01:01

jenkins-droid · 2023-11-28T08:53:43Z

Jenkins Linux s390x Build #13552 [push] A rewrite pattern to opt... passed after 1 hr 52 min

jenkins-droid · 2023-11-28T09:06:39Z

Jenkins Linux ppc64le Build #12549 [push] A rewrite pattern to opt... passed after 2 hr 5 min

jenkins-droid · 2023-11-28T09:17:07Z

Jenkins Linux amd64 Build #13525 [push] A rewrite pattern to opt... passed after 2 hr 15 min

tungld added 2 commits November 24, 2023 16:11

A pattern to optimize a scalar Div in self-attention layer

3056b96

Signed-off-by: Tung D. Le <[email protected]>

remove onnx_node_name in the lit test

f094daf

Signed-off-by: Tung D. Le <[email protected]>

tungld requested a review from AlexandreEichenberger November 24, 2023 10:28

Support Mul

f98422a

Signed-off-by: Tung D. Le <[email protected]>

tungld changed the title ~~A rewrite pattern to optimize a scalar Div in self-attention layer~~ A rewrite pattern to optimize constant scaling in self-attention layer Nov 27, 2023

AlexandreEichenberger approved these changes Nov 27, 2023

View reviewed changes

tungld added 2 commits November 28, 2023 13:47

Support GEMM

2bbf650

Signed-off-by: Tung D. Le <[email protected]>

Merge branch 'main' into prop-div-back-to-constant

f1de96d

tungld merged commit 135cac8 into onnx:main Nov 28, 2023
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A rewrite pattern to optimize constant scaling in self-attention layer #2640

A rewrite pattern to optimize constant scaling in self-attention layer #2640

tungld commented Nov 24, 2023 •

edited

Loading

tehbone commented Nov 24, 2023

tungld commented Nov 24, 2023

AlexandreEichenberger left a comment

AlexandreEichenberger Nov 27, 2023

tungld Nov 28, 2023

AlexandreEichenberger Nov 27, 2023

tungld Nov 28, 2023

tungld Nov 28, 2023

gongsu832 commented Nov 27, 2023

jenkins-droid commented Nov 28, 2023

jenkins-droid commented Nov 28, 2023

jenkins-droid commented Nov 28, 2023

jenkins-droid commented Nov 28, 2023

jenkins-droid commented Nov 28, 2023

jenkins-droid commented Nov 28, 2023

A rewrite pattern to optimize constant scaling in self-attention layer #2640

A rewrite pattern to optimize constant scaling in self-attention layer #2640

Conversation

tungld commented Nov 24, 2023 • edited Loading

tehbone commented Nov 24, 2023

tungld commented Nov 24, 2023

AlexandreEichenberger left a comment

Choose a reason for hiding this comment

AlexandreEichenberger Nov 27, 2023

Choose a reason for hiding this comment

tungld Nov 28, 2023

Choose a reason for hiding this comment

AlexandreEichenberger Nov 27, 2023

Choose a reason for hiding this comment

tungld Nov 28, 2023

Choose a reason for hiding this comment

tungld Nov 28, 2023

Choose a reason for hiding this comment

gongsu832 commented Nov 27, 2023

jenkins-droid commented Nov 28, 2023

jenkins-droid commented Nov 28, 2023

jenkins-droid commented Nov 28, 2023

jenkins-droid commented Nov 28, 2023

jenkins-droid commented Nov 28, 2023

jenkins-droid commented Nov 28, 2023

tungld commented Nov 24, 2023 •

edited

Loading