Skip to content

Latest commit





Intel® Neural Compressor Sample for TensorFlow*

Low-precision optimizations can speed up inference. You can achieve higher inference performance by converting the FP32 model to INT8 or BF16 model. Additionally, Intel® Deep Learning Boost technology in the Second Generation Intel® Xeon® Scalable processors and newer Xeon® processors provides hardware acceleration for INT8 and BF16 models.

Intel® Neural Compressor simplifies the process of converting the FP32 model to INT8/BF16.

At the same time, Intel® Neural Compressor tunes the quanization method to reduce the accuracy loss, which is a big blocker for low-precision inference.

Intel® Neural Compressor is part of Intel® AI Analytics Kit (AI Kit) and works with Intel® Optimizations for TensorFlow*.

Refer to the official web site for detailed information and news:


This sample shows the whole process of building a convolutional neural network (CNN) model to recognize handwritten numbers and increasing the inference performance by using Intel® Neural Compressor.

We will learn how to train a CNN model with Keras and TensorFlow, use Intel® Neural Compressor to quantize the model, and compare the performance to see the benefit of Intel® Neural Compressor.


Optimized for Description
OS Linux* Ubuntu* 18.04 or later, Windows 10*
Hardware The Second Generation Intel® Xeon® Scalable processor family or newer Xeon® processors
Software Intel® AI Analytics Toolkit 2021.1 or later
What you will learn How to use Intel® Neural Compressor tool to quantize the AI model based on TensorFlow* and speed up the inference on Intel® Xeon® CPUs
Time to complete 10 minutes

Intel® Neural Compressor and Sample Code Versions

This sample code is always updated for the Intel® Neural Compressor version in the latest Intel® AI Analytics Kit release.

If you want to get the sample code for an earlier toolkit release, checkout the corresponding git tag.

List the available git tags:

git tag


Checkout a git tag:

git checkout 2021.1-beta10

Key Implementation Details

  • Use Keras from TensorFlow* to build and train a CNN model.

  • Define a function and class for Intel® Neural Compressor to quantize the CNN model.

    The Intel® Neural Compressor can run on any Intel® CPU to quantize the AI model.

    The quantized AI model has better inference performance than the FP32 model on Intel CPUs.

    Specifically, the Second Generation Intel® Xeon® Scalable processors and newer Xeon® processors provide hardware acceleration for such tasks.

  • Test the performance of the FP32 model and INT8 (quantization) model.

Prepare Software Environment

Linux (Ubuntu)

You can run this sample in a Jupyter notebook on your local computer or in the Intel® DevCloud.

Note: If you have not already done so, set up your CLI environment by sourcing the setvars script located in the root of your oneAPI installation.

Linux Sudo: . /opt/intel/oneapi/

Linux User: . ~/intel/oneapi/

Windows: C:\Program Files(x86)\Intel\oneAPI\setvars.bat

For more information on environment variables, see Use the setvars Script for Linux or macOS, or Windows.

  1. Install Intel® AI Analytics Toolkit.

    If you use the Intel® DevCloud, skip this step. The toolkit is already installed for you.

    For installation instructions, refer to Intel® AI Analytics Toolkit Installation Guides.

    Intel® Optimizations for TensorFlow* is included in Intel® AI Analytics Toolkit. So, you do not have to install it separately.

    This sample depends on TensorFlow 2.2* or newer.

  2. Activate the conda environment with Intel® Optimizations for TensorFlow*.

    You can list the available conda environments with the following command:

    conda info -e
    # conda environments:
    base                  *  /opt/intel/oneapi/intelpython/latest
    pytorch                  /opt/intel/oneapi/intelpython/latest/envs/pytorch
    pytorch-1.7.0            /opt/intel/oneapi/intelpython/latest/envs/pytorch-1.7.0
    tensorflow               /opt/intel/oneapi/intelpython/latest/envs/tensorflow
    tensorflow-2.3.0         /opt/intel/oneapi/intelpython/latest/envs/tensorflow-2.3.0

    By default, the Intel® AI Analytics Toolkit is installed in the /opt/intel/oneapi folder, which requires root privileges to manage it.

    • If you have the root access to your oneAPI installation path:

      conda activate tensorflow
      (tensorflow) xxx@yyy:
    • If you do not have the root access to your oneAPI installation path, clone the tensorflow conda environment using the following command:

      conda create --name usr_tensorflow --clone tensorflow

      Then activate your conda environment with the following command:

      source activate usr_tensorflow
  3. Install Intel® Neural Compressor from the local channel.

    conda install -c ${ONEAPI_ROOT}/conda_channel neural-compressor -y --offline
  4. Install Jupyter Notebook.

    Skip this step if you are working in the DevCloud.

    python -m pip install notebook
  5. Create a new kernel for the Jupyter notebook based on your activated conda environment.

    conda install ipykernel
    python -m ipykernel install --user --name usr_tensorflow

    This step is optional if you plan to open the notebook on your local server.

Windows 10

Setup the Conda running environment user_tensorflow by following commands:

conda deactivate
conda env remove -n user_tensorflow
conda create -n user_tensorflow python=3.9 -y
conda activate user_tensorflow
conda install -n user_tensorflow pycocotools -c esri -y
conda install -n user_tensorflow neural-compressor tensorflow -c conda-forge -c intel -y
conda install -n user_tensorflow jupyter runipy notebook -y

Run the Sample

You can run the Jupyter notebook with the sample code on your local server or use Intel® DevCloud.

Run the Sample on Local Server

To open the Jupyter notebook on your local server:

  1. Make sure you activate the conda environment.

    source /opt/intel/oneapi/
    conda activate tensorflow


    conda activate usr_tensorflow
  2. Start the Jupyter notebook server.

    Run the script that is located in the sample code directory:


    The jupyter server prints the URLs of the web aplication in your terminal.

    (tensorflow) xxx@yyy:$ [I 09:48:12.622 NotebookApp] Serving notebooks from local directory:
    [I 09:48:12.622 NotebookApp] Jupyter Notebook 6.1.4 is running at:
    [I 09:48:12.622 NotebookApp] http://yyy:8888/?token=146761d9317552c43e0d6b8b6b9e1108053d465f6ca32fca
    [I 09:48:12.622 NotebookApp]  or
    [I 09:48:12.622 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
    [C 09:48:12.625 NotebookApp]
    	To access the notebook, open this file in a browser:
    	Or copy and paste one of these URLs:
    [I 09:48:26.128 NotebookApp] Kernel started: bc5b0e60-058b-4a4f-8bad-3f587fc080fd, name: python3
    [IPKernelApp] ERROR | No such comm target registered: jupyter.widget.version
  3. In a web browser, open the link that the Jupyter server displayed when you started it. For example: http://yyy:8888/?token=146761d9317552c43e0d6b8b6b9e1108053d465f6ca32fca.

  4. In the Notebook Dashboard, click inc_sample_tensorflow.ipynb to open the notebook.

  5. Run the sample code and read the explanations in the notebook.

Run the Sample in the Intel® DevCloud

  1. Open the following link in your browser:

  2. In the Notebook Dashboard, navigate to the inc_sample_tensorflow.ipynb file and open it.

  3. To change the kernel, click Kernel > Change kernel > usr_tensorflow.

  4. Run the sample code and read the explanations in the notebook.

Build and Run Additional Samples

Several sample programs are available for you to try, many of which can be compiled and run in a similar fashion to this Intel® Neural Compressor sample for Tensorflow. Experiment with running the various samples on different kinds of compute nodes or adjust their source code to experiment with different workloads.


If an error occurs, troubleshoot the problem using the Diagnostics Utility for Intel® oneAPI Toolkits. Learn more

Using Visual Studio Code* (Optional)

You can use Visual Studio Code (VS Code) extensions to set your environment, create launch configurations, and browse and download samples.

The basic steps to build and run a sample using VS Code include:

  • Download a sample using the extension Code Sample Browser for Intel oneAPI Toolkits.
  • Configure the oneAPI environment with the extension Environment Configurator for Intel oneAPI Toolkits.
  • Open a Terminal in VS Code (Terminal>New Terminal).
  • Run the sample in the VS Code terminal using the instructions below.
  • (Linux only) Debug your GPU application with GDB for Intel® oneAPI toolkits using the Generate Launch Configurations extension.

To learn more about the extensions, see Using Visual Studio Code with Intel® oneAPI Toolkits.

After learning how to use the extensions for Intel oneAPI Toolkits, return to this readme for instructions on how to build and run a sample.


Code samples are licensed under the MIT license. See License.txt for details.

Third party program Licenses can be found here: third-party-programs.txt