Problem about training with int8

Hello everyone,

Recently, we are focusing on training with int8, not inference on int8. Considering the numerical limitation of int8, at first we keep all parameters in fp32 and only quantize convolution layer (conduct int8 operation) as it is the most compute-intensive part of a model. During the past months, we have achieved some progress (such accuracy comparable to fp32 training and faster than fp32 training), and our paper is accepted by CVPR2020 (

Now, we want to further explore quantization of other layers like ReLU, Pooling and BatchNorm, and keep dataflow in int8 in order to save memory.
(1) However, we could not use int8 tensor as input & output of a layer in PyTorch due to the autograd mechanism. Would you consider supporting int8 tensor in the future?
(2) Moreover, we want to pass quantization parameters like scale from layer to layer, but there are some problems at the shortcut connection. During backward, the autograd mechanism will add the gradient from main path and the gradient from the shortcut connection automatically. As we map quantization parameters to tensor, the quantization parameters are lost after the add operation. Could you provide some suggestions?

Many thanks.


I think there are some work planned for int8 training, cc @raghuramank100 for more details.

hi @raghuramank100 @hx89 , I am also one of the authors of INT8 training paper on CVPR2020.
Firstly, I want to clarify that we are focus on using INT8 computation to speed up training process , not for quantization aware training. It means that we need to quantize gradient on convolution backward not only for forward.

we are now achieve nearly no accuracy (< 1% TOP1) decent on ImageNet of ResNet / MobileNet/ Inception/ ShuffleNet, Even on detection task on Pascal VOC and COCO with RetinaNet and FasterRCNN we got only ~1% mAP drop. We can check accuracy table on paper:

(see table 7,8 and 9)

And we also considering about INT8 computation implement and overhead reducing (see Section 3.6. General Purpose Training Framework)

We implement with DP4A on GTX1080TI and finally get 1.6x(forward) and 1.9x(backward) speed on convolution over cuDNN:

(see Figure 8.)

We are now plan INT8 Training V2 about quantization all CNN layer include ReLU, Pooling, Add. But We found pytorch not yet support INT8 gradient backward, So we need Pytorch team to give some help.

Feel great to see your response!


We are interested in your research, could you share your code?

Thanks, quantized training is one of our future directions.

what do you mean by “pytorch not yet support int8 gradient backward”? can you elaborate in more detail?