Skip to content

marvosyntactical/tqcore

Repository files navigation

TinyQuant Core

A Small Framework for Simulation of Low-Bitwidth Quantization in Pytorch.

Pytorch itself, as of August 2022, contains various quantization options, however None for Below 8bit.

This is a small framework for testing out the loss in accuracy when quantizing a pytorch model to below 8bit.

It is NOT aimed at efficiency (Therefore it is implemented in raw pytorch, not CUDA/C++).

The Quantization Framework used is that of Jacob et al. (2018).

It was presented for CNNs in that paper, and is applied to the transformer here; with various options.

@inproceedings{jacob2018quantization,
  title={Quantization and training of neural networks for efficient integer-arithmetic-only inference},
  author={Jacob, Benoit and Kligys, Skirmantas and Chen, Bo and Zhu, Menglong and Tang, Matthew and Howard, Andrew and Adam, Hartwig and Kalenichenko, Dmitry},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={2704--2713},
  year={2018}
}

Adapted from bklein/dl-frameworks-quantization

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages