TinyQuant Core

A Small Framework for Simulation of Low-Bitwidth Quantization in Pytorch.

Pytorch itself, as of August 2022, contains various quantization options, however None for Below 8bit.

This is a small framework for testing out the loss in accuracy when quantizing a pytorch model to below 8bit.

It is NOT aimed at efficiency (Therefore it is implemented in raw pytorch, not CUDA/C++).

The Quantization Framework used is that of Jacob et al. (2018).

It was presented for CNNs in that paper, and is applied to the transformer here; with various options.

@inproceedings{jacob2018quantization,
  title={Quantization and training of neural networks for efficient integer-arithmetic-only inference},
  author={Jacob, Benoit and Kligys, Skirmantas and Chen, Bo and Zhu, Menglong and Tang, Matthew and Howard, Andrew and Adam, Hartwig and Kalenichenko, Dmitry},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={2704--2713},
  year={2018}
}

Adapted from bklein/dl-frameworks-quantization

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
batchnorm.py		batchnorm.py
calibration.py		calibration.py
config.py		config.py
histogram.py		histogram.py
kernel.py		kernel.py
lstm.py		lstm.py
modules.py		modules.py
qtensor.py		qtensor.py
quantizable_layer.py		quantizable_layer.py
quantization_functions.py		quantization_functions.py
transformer.py		transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyQuant Core

About

Releases

Packages

Languages

License

marvosyntactical/tqcore

Folders and files

Latest commit

History

Repository files navigation

TinyQuant Core

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages