About the int8 training question

UmeTFE · January 9, 2023, 6:44pm

Hi，

I know now PyTorch only supports float32 training. But I wonder if you have any idea or code example about int8 training, that is we use the int8 DNN model and retrain it in the int8 environment. There are one or two papers about int8 training, but can’t find the code.

Thanks in advance!

J_Johnson · January 9, 2023, 7:02pm

Pytorch has int8 dtype compatibility. It’s referred to as “quantization”. See here:

UmeTFE · January 9, 2023, 8:11pm

Hi,

Thanks for your answer, I know PyTorch can fake quantize the DNN model to int8 in the end, but this training process is still in float32. But I want to know if PyTorch can or how to train the DNN with int8.

Best

J_Johnson · January 10, 2023, 12:35am

Dynamic Quantization

The easiest method of quantization PyTorch supports is called dynamic quantization. This involves not just converting the weights to int8 - as happens in all quantization variants - but also converting the activations to int8 on the fly, just before doing the computation (hence “dynamic”). The computations will thus be performed using efficient int8 matrix multiplication and convolution implementations, resulting in faster compute. However, the activations are read and written to memory in floating point format.

UmeTFE · January 10, 2023, 9:36am

Hi Johnson,

Thanks for your answer, I read the blog, but this blog, or “dynamic quantization”, seems like it still helps you to get an int8 model. But not train the model in an int8 environment. For pure int8 training, I think need to rewrite the BP and so on. I think this paper is what I want, " Towards Unified INT8 Training for Convolutional Neural Network", but I can’t find the code.

Thanks again for your answer! Have a nice day!

J_Johnson · January 10, 2023, 10:15am

I’m wondering if you read it or just skimmed it. The blog provides 3 methods for quantization. Only the last(3rd) method is “fake” quantization. There are two other methods in the blog, including dynamic quantization, which is not considered(at least by most) to be “fake” quantization.

UmeTFE · January 10, 2023, 3:13pm

Yes, I tried the methods of quantization, but I don’t think they are the methods that I want. I want “training”, not “inference” or using float to simulate the “int”.

A similar question is here:

Vasiliy_Kuznetsov · January 13, 2023, 4:10am

Hi @UmeTFE , the official PyTorch quantization APIs are geared towards inference. They support QAT, but QAT is still training in floating point and converting to integer for inference. You can check out torch.amp for training, but that only supports floating point types.

Doing training in the integer domain is not currently in core PyTorch, it’s more of an open research area as far as I know. Would love to hear more about the motivation for this.

srishti-git1110 · January 13, 2023, 9:33am

Hi @UmeTFE
Training the networks in int type is a bit tricky as the backpropagated gradients have a high probability of getting converted to 0 when in int mode.
The most straight forward consequence of this is that this would lead the network to not learn at all or learn very slowly.

As far as I know, this is an active area of research, but still quantization is mostly used as in inference-only technique, and as @Vasiliy_Kuznetsov mentioned PyTorch concurrently has support for three major quantization techniques - Dynamic quantization, PTSQ (Post training static quantization), and QAT (Quantization aware training).

With QAT, during the training we make the network aware of the fact that quantization shall be performed during inference. In other words, the network trains while being aware of it, and hence (hopefully) leads to the metrics to not worsen a lot after quantization is performed during inference.

UmeTFE · January 13, 2023, 9:45am

Hi,

Thank you so much for your reply.

I want to test the research idea ( which needs both training and inference processes )on the embedded system. But I don’t want to install PyTorch on it. That’s why I wonder if someone already did similar work before.

Thanks again for the answer.

UmeTFE · January 13, 2023, 9:47am

Hi,

Thanks for the answer, seems like now this question is still a research topic, and PyTorch is more caring about inference.

Thanks again for the answer and have a nice day!

Vasiliy_Kuznetsov · January 13, 2023, 4:15pm

@UmeTFE ,

Obviously I don’t have the full context of your problem, but usually people train in floating point and convert to integer for inference. If your embedded environment supports fp32 or fp16, doing training in those dtypes should be easier (and probably supported by whichever framework you end up using) than trying to get integer training working.

UmeTFE · January 17, 2023, 9:06am

Hi,
I agree with you, maybe int8 training is not a practical technique.

srishti-git1110 · January 17, 2023, 9:24am

@UmeTFE just curious if you’ve perhaps referred to/know of some literature around training NNs in integer precision?

UmeTFE · January 17, 2023, 12:29pm

Hi,

There are a few papers about this, like the link I pasted above, but not so many…Most of the work is trying to use float to simulate int and decrease the “float to int” error.

I didn’t find any open-source code for int training. But I feel it’s too much work for one person to achieve the int training framework.

2623448751 · April 25, 2024, 5:24am

Hi, I’m looking for training in INT8 too. I found that most the methods are about inference in INT8, like TensorRT. The nearest method I found is Pytorch Apex, but it didn’t support INT8 training. Have u find some methods that support training in INT8? If u have some ideas, pls contact me, thanks!

HDCharles · April 26, 2024, 6:31pm

there are some efforts for training in Int8 (or some variant thereof) for improved speed, but in general the pytorch ao team doesn’t have tools to support that at the moment. We focus on taking trained models and optimizing them for performance with some tools oriented towards fine-tuning the accuracy of the quantized model so it better matches the original floating point model.

if you want more information about low precision training, i know Nvidia’s microscaling formats: https://arxiv.org/pdf/2310.10537 paper has been getting a lot of attention.