About the int8 training question


I know now PyTorch only supports float32 training. But I wonder if you have any idea or code example about int8 training, that is we use the int8 DNN model and retrain it in the int8 environment. There are one or two papers about int8 training, but can’t find the code.

Thanks in advance!

Pytorch has int8 dtype compatibility. It’s referred to as “quantization”. See here:


Thanks for your answer, I know PyTorch can fake quantize the DNN model to int8 in the end, but this training process is still in float32. But I want to know if PyTorch can or how to train the DNN with int8.


Dynamic Quantization

The easiest method of quantization PyTorch supports is called dynamic quantization. This involves not just converting the weights to int8 - as happens in all quantization variants - but also converting the activations to int8 on the fly, just before doing the computation (hence “dynamic”). The computations will thus be performed using efficient int8 matrix multiplication and convolution implementations, resulting in faster compute. However, the activations are read and written to memory in floating point format.

1 Like

Hi Johnson,

Thanks for your answer, I read the blog, but this blog, or “dynamic quantization”, seems like it still helps you to get an int8 model. But not train the model in an int8 environment. For pure int8 training, I think need to rewrite the BP and so on. I think this paper is what I want, " Towards Unified INT8 Training for Convolutional Neural Network", but I can’t find the code.

Thanks again for your answer! Have a nice day!

I’m wondering if you read it or just skimmed it. The blog provides 3 methods for quantization. Only the last(3rd) method is “fake” quantization. There are two other methods in the blog, including dynamic quantization, which is not considered(at least by most) to be “fake” quantization.

Yes, I tried the methods of quantization, but I don’t think they are the methods that I want. I want “training”, not “inference” or using float to simulate the “int”.

A similar question is here:

Hi @UmeTFE , the official PyTorch quantization APIs are geared towards inference. They support QAT, but QAT is still training in floating point and converting to integer for inference. You can check out torch.amp for training, but that only supports floating point types.

Doing training in the integer domain is not currently in core PyTorch, it’s more of an open research area as far as I know. Would love to hear more about the motivation for this.

1 Like

Hi @UmeTFE
Training the networks in int type is a bit tricky as the backpropagated gradients have a high probability of getting converted to 0 when in int mode.
The most straight forward consequence of this is that this would lead the network to not learn at all or learn very slowly.

As far as I know, this is an active area of research, but still quantization is mostly used as in inference-only technique, and as @Vasiliy_Kuznetsov mentioned PyTorch concurrently has support for three major quantization techniques - Dynamic quantization, PTSQ (Post training static quantization), and QAT (Quantization aware training).

With QAT, during the training we make the network aware of the fact that quantization shall be performed during inference. In other words, the network trains while being aware of it, and hence (hopefully) leads to the metrics to not worsen a lot after quantization is performed during inference.

1 Like


Thank you so much for your reply.

I want to test the research idea ( which needs both training and inference processes )on the embedded system. But I don’t want to install PyTorch on it. That’s why I wonder if someone already did similar work before.

Thanks again for the answer.


Thanks for the answer, seems like now this question is still a research topic, and PyTorch is more caring about inference.

Thanks again for the answer and have a nice day!

@UmeTFE ,

Obviously I don’t have the full context of your problem, but usually people train in floating point and convert to integer for inference. If your embedded environment supports fp32 or fp16, doing training in those dtypes should be easier (and probably supported by whichever framework you end up using) than trying to get integer training working.

I agree with you, maybe int8 training is not a practical technique.

@UmeTFE just curious if you’ve perhaps referred to/know of some literature around training NNs in integer precision?

1 Like


There are a few papers about this, like the link I pasted above, but not so many…Most of the work is trying to use float to simulate int and decrease the “float to int” error.

I didn’t find any open-source code for int training. But I feel it’s too much work for one person to achieve the int training framework.