I know now PyTorch only supports float32 training. But I wonder if you have any idea or code example about int8 training, that is we use the int8 DNN model and retrain it in the int8 environment. There are one or two papers about int8 training, but can’t find the code.
Thanks for your answer, I know PyTorch can fake quantize the DNN model to int8 in the end, but this training process is still in float32. But I want to know if PyTorch can or how to train the DNN with int8.
The easiest method of quantization PyTorch supports is called dynamic quantization. This involves not just converting the weights to int8 - as happens in all quantization variants - but also converting the activations to int8 on the fly, just before doing the computation (hence “dynamic”). The computations will thus be performed using efficient int8 matrix multiplication and convolution implementations, resulting in faster compute. However, the activations are read and written to memory in floating point format.
Thanks for your answer, I read the blog, but this blog, or “dynamic quantization”, seems like it still helps you to get an int8 model. But not train the model in an int8 environment. For pure int8 training, I think need to rewrite the BP and so on. I think this paper is what I want, " Towards Unified INT8 Training for Convolutional Neural Network", but I can’t find the code.
I’m wondering if you read it or just skimmed it. The blog provides 3 methods for quantization. Only the last(3rd) method is “fake” quantization. There are two other methods in the blog, including dynamic quantization, which is not considered(at least by most) to be “fake” quantization.
Yes, I tried the methods of quantization, but I don’t think they are the methods that I want. I want “training”, not “inference” or using float to simulate the “int”.
Hi @UmeTFE , the official PyTorch quantization APIs are geared towards inference. They support QAT, but QAT is still training in floating point and converting to integer for inference. You can check out torch.amp for training, but that only supports floating point types.
Doing training in the integer domain is not currently in core PyTorch, it’s more of an open research area as far as I know. Would love to hear more about the motivation for this.
Hi @UmeTFE
Training the networks in int type is a bit tricky as the backpropagated gradients have a high probability of getting converted to 0 when in int mode.
The most straight forward consequence of this is that this would lead the network to not learn at all or learn very slowly.
As far as I know, this is an active area of research, but still quantization is mostly used as in inference-only technique, and as @Vasiliy_Kuznetsov mentioned PyTorch concurrently has support for three major quantization techniques - Dynamic quantization, PTSQ (Post training static quantization), and QAT (Quantization aware training).
With QAT, during the training we make the network aware of the fact that quantization shall be performed during inference. In other words, the network trains while being aware of it, and hence (hopefully) leads to the metrics to not worsen a lot after quantization is performed during inference.
I want to test the research idea ( which needs both training and inference processes )on the embedded system. But I don’t want to install PyTorch on it. That’s why I wonder if someone already did similar work before.
Obviously I don’t have the full context of your problem, but usually people train in floating point and convert to integer for inference. If your embedded environment supports fp32 or fp16, doing training in those dtypes should be easier (and probably supported by whichever framework you end up using) than trying to get integer training working.
There are a few papers about this, like the link I pasted above, but not so many…Most of the work is trying to use float to simulate int and decrease the “float to int” error.
I didn’t find any open-source code for int training. But I feel it’s too much work for one person to achieve the int training framework.
Hi, I’m looking for training in INT8 too. I found that most the methods are about inference in INT8, like TensorRT. The nearest method I found is Pytorch Apex, but it didn’t support INT8 training. Have u find some methods that support training in INT8? If u have some ideas, pls contact me, thanks!
there are some efforts for training in Int8 (or some variant thereof) for improved speed, but in general the pytorch ao team doesn’t have tools to support that at the moment. We focus on taking trained models and optimizing them for performance with some tools oriented towards fine-tuning the accuracy of the quantized model so it better matches the original floating point model.
if you want more information about low precision training, i know Nvidia’s microscaling formats: https://arxiv.org/pdf/2310.10537 paper has been getting a lot of attention.