-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization for Linear Quantization #2954
Optimization for Linear Quantization #2954
Conversation
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
||
Type inputElementType = inputType.getElementType(); | ||
unsigned inputWidth; | ||
if (isa<Float32Type>(inputElementType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use
if (inputElementType.isF32() || inputElementType.isF64())
inputWidth = mlir::cast<FloatType>(inputElementType).getWidth();
else
llvm_unreachable("unsupported input type");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, will use.
bool isSigned = quantizedIntType.isSignless() || quantizedIntType.isSigned(); | ||
Type quantizedElementTypeInputSized; | ||
if (isSigned) { | ||
// Cannot use getIntegerType(inputWidth, true) as it returns signed ints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you would like to create i32
or i64
, I guess we can use getIntegerType(inputWidth)
that has a single parameter. Meanwhile getIntegerType(inputWidth, true/false)
with two parameters will emit si32
/ui32
, si64
/ui64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, interface not very clear, like your suggestion, will use.
create.math.cast(quantizedElementTypeInputSized, saturateX); | ||
// Reduce quantized precision. | ||
Value res = | ||
create.math.cast(quantizedElementType, qSaturateXInputSized); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is beneficial if we do this two-step cast inside a single create.math.cast
, saying when the source type is f32
and the target type is i8
we do a two-step cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be the right way to do it, will add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrated the functionality into cast for float -> int small (quantization). Also added the dequantization conversion here (small int -> float); that did not result in performance changes on z.
Value res = create.math.cast(quantizedElementType, buffVal); | ||
create.krnl.storeIE(res, flatAlloc, {zero}, {loopInd[0]}); | ||
}); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so now we don't need to use the additional loop for conversion.
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
@tungld thanks for the very useful suggestions, much appreciated as always. |
Jenkins Linux ppc64le Build #14746 [push] Optimization for Linear ... started at 16:57 |
Jenkins Linux amd64 Build #15716 [push] Optimization for Linear ... started at 15:45 |
Jenkins Linux s390x Build #15719 [push] Optimization for Linear ... started at 16:45 |
Jenkins Linux amd64 Build #15716 [push] Optimization for Linear ... passed after 1 hr 7 min |
Jenkins Linux s390x Build #15719 [push] Optimization for Linear ... passed after 1 hr 32 min |
Jenkins Linux ppc64le Build #14746 [push] Optimization for Linear ... passed after 2 hr 3 min |
It turns out that while converting from float to int8 in one go is not efficient in terms of generated SIMD code, doing it in two steps (first float32 to int32, and then int32 to int8) generates quite good code.
The code now runs for 64 K data points from 157us (original) to 124 (prior version with 2 loops) to now 96us, so a 1.7x overall speedup.
Interestingly, the z16 float to int asm operation used is
vcfeb
orvclfeb
(signed vs unsigned) has a mode that does the round to nearest even. It cannot be accessed as is (only via inlined asm). If we were able to exploit this, we may further improve the linear quantization step.