Optimization for Linear Quantization #2954

AlexandreEichenberger · 2024-09-25T17:53:31Z

It turns out that while converting from float to int8 in one go is not efficient in terms of generated SIMD code, doing it in two steps (first float32 to int32, and then int32 to int8) generates quite good code.

The code now runs for 64 K data points from 157us (original) to 124 (prior version with 2 loops) to now 96us, so a 1.7x overall speedup.

Interestingly, the z16 float to int asm operation used is vcfeb or vclfeb (signed vs unsigned) has a mode that does the round to nearest even. It cannot be accessed as is (only via inlined asm). If we were able to exploit this, we may further improve the linear quantization step.

Signed-off-by: Alexandre Eichenberger <[email protected]>

tungld

LGTM.

tungld · 2024-09-26T12:27:45Z

src/Conversion/ONNXToKrnl/Quantization/QuantizeLinear.cpp

-
+  Type inputElementType = inputType.getElementType();
+  unsigned inputWidth;
+  if (isa<Float32Type>(inputElementType))


You can use

if (inputElementType.isF32() || inputElementType.isF64()) inputWidth = mlir::cast<FloatType>(inputElementType).getWidth(); else llvm_unreachable("unsupported input type");

thanks, will use.

tungld · 2024-09-26T12:35:51Z

src/Conversion/ONNXToKrnl/Quantization/QuantizeLinear.cpp

+  bool isSigned = quantizedIntType.isSignless() || quantizedIntType.isSigned();
+  Type quantizedElementTypeInputSized;
+  if (isSigned) {
+    // Cannot use getIntegerType(inputWidth, true) as it returns signed ints.


If you would like to create i32 or i64, I guess we can use getIntegerType(inputWidth) that has a single parameter. Meanwhile getIntegerType(inputWidth, true/false) with two parameters will emit si32/ui32, si64/ui64.

Thanks, interface not very clear, like your suggestion, will use.

tungld · 2024-09-26T12:38:20Z

src/Conversion/ONNXToKrnl/Quantization/QuantizeLinear.cpp

+            create.math.cast(quantizedElementTypeInputSized, saturateX);
+        // Reduce quantized precision.
+        Value res =
+            create.math.cast(quantizedElementType, qSaturateXInputSized);


Do you think it is beneficial if we do this two-step cast inside a single create.math.cast, saying when the source type is f32 and the target type is i8 we do a two-step cast?

That would be the right way to do it, will add.

Migrated the functionality into cast for float -> int small (quantization). Also added the dequantization conversion here (small int -> float); that did not result in performance changes on z.

tungld · 2024-09-26T12:39:32Z

src/Conversion/ONNXToKrnl/Quantization/QuantizeLinear.cpp

-        Value res = create.math.cast(quantizedElementType, buffVal);
-        create.krnl.storeIE(res, flatAlloc, {zero}, {loopInd[0]});
-      });
-


I see, so now we don't need to use the additional loop for conversion.

Signed-off-by: Alexandre Eichenberger <[email protected]>

…er/onnx-mlir into quant-opt-v6

Signed-off-by: Alexandre Eichenberger <[email protected]>

AlexandreEichenberger · 2024-09-26T19:33:49Z

@tungld thanks for the very useful suggestions, much appreciated as always.

jenkins-droid · 2024-09-26T20:45:53Z

Jenkins Linux ppc64le Build #14746 [push] Optimization for Linear ... started at 16:57

jenkins-droid · 2024-09-26T20:45:53Z

Jenkins Linux amd64 Build #15716 [push] Optimization for Linear ... started at 15:45

jenkins-droid · 2024-09-26T20:45:54Z

Jenkins Linux s390x Build #15719 [push] Optimization for Linear ... started at 16:45

jenkins-droid · 2024-09-26T21:52:53Z

Jenkins Linux amd64 Build #15716 [push] Optimization for Linear ... passed after 1 hr 7 min

jenkins-droid · 2024-09-26T22:18:10Z

Jenkins Linux s390x Build #15719 [push] Optimization for Linear ... passed after 1 hr 32 min

jenkins-droid · 2024-09-26T22:49:26Z

Jenkins Linux ppc64le Build #14746 [push] Optimization for Linear ... passed after 2 hr 3 min

AlexandreEichenberger and others added 8 commits September 25, 2024 11:38

new attempt at linear quant opt

3978448

Signed-off-by: Alexandre Eichenberger <[email protected]>

in steps

ac67673

Signed-off-by: Alexandre Eichenberger <[email protected]>

2 steps into one loop

d30d7c7

Signed-off-by: Alexandre Eichenberger <[email protected]>

removed alternative code versions

79a8875

Signed-off-by: Alexandre Eichenberger <[email protected]>

larger unroll

0dffbc9

Signed-off-by: Alexandre Eichenberger <[email protected]>

fix lit tests

34b97da

Signed-off-by: Alexandre Eichenberger <[email protected]>

limit unroll

0b270e5

Signed-off-by: Alexandre Eichenberger <[email protected]>

Merge branch 'main' into quant-opt-v6

6f5597b

tungld approved these changes Sep 26, 2024

View reviewed changes

AlexandreEichenberger added 4 commits September 26, 2024 11:36

update

6ce7077

moved conversions to math builder cast

2fdd72c

Signed-off-by: Alexandre Eichenberger <[email protected]>

Merge branch 'quant-opt-v6' of https://github.com/AlexandreEichenberg…

e43841e

…er/onnx-mlir into quant-opt-v6

update lit tests

28863c8

Signed-off-by: Alexandre Eichenberger <[email protected]>

AlexandreEichenberger merged commit f7d5895 into onnx:main Sep 26, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization for Linear Quantization #2954

Optimization for Linear Quantization #2954

AlexandreEichenberger commented Sep 25, 2024

tungld left a comment

tungld Sep 26, 2024

AlexandreEichenberger Sep 26, 2024

tungld Sep 26, 2024 •

edited

Loading

AlexandreEichenberger Sep 26, 2024

tungld Sep 26, 2024

AlexandreEichenberger Sep 26, 2024

AlexandreEichenberger Sep 26, 2024

tungld Sep 26, 2024

AlexandreEichenberger commented Sep 26, 2024

jenkins-droid commented Sep 26, 2024

jenkins-droid commented Sep 26, 2024

jenkins-droid commented Sep 26, 2024

jenkins-droid commented Sep 26, 2024

jenkins-droid commented Sep 26, 2024

jenkins-droid commented Sep 26, 2024

Optimization for Linear Quantization #2954

Optimization for Linear Quantization #2954

Conversation

AlexandreEichenberger commented Sep 25, 2024

tungld left a comment

Choose a reason for hiding this comment

tungld Sep 26, 2024

Choose a reason for hiding this comment

AlexandreEichenberger Sep 26, 2024

Choose a reason for hiding this comment

tungld Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

AlexandreEichenberger Sep 26, 2024

Choose a reason for hiding this comment

tungld Sep 26, 2024

Choose a reason for hiding this comment

AlexandreEichenberger Sep 26, 2024

Choose a reason for hiding this comment

AlexandreEichenberger Sep 26, 2024

Choose a reason for hiding this comment

tungld Sep 26, 2024

Choose a reason for hiding this comment

AlexandreEichenberger commented Sep 26, 2024

jenkins-droid commented Sep 26, 2024

jenkins-droid commented Sep 26, 2024

jenkins-droid commented Sep 26, 2024

jenkins-droid commented Sep 26, 2024

jenkins-droid commented Sep 26, 2024

jenkins-droid commented Sep 26, 2024

tungld Sep 26, 2024 •

edited

Loading