After using dynamic quantization and static quantization, the accuracy of the model decreases a lot

Ritian-Li · March 19, 2022, 10:12am

Hi all, I just started learning model quantification.
The model I use is divided into front-end(based on CNN) and back-end(based on GRU), so I used static quantization and dynamic quantization. The ACC will decrease a little if using dynamic quantization which seems like a correct result. But coupled with static quantization, the accuracy of the model will decline significantly. I don’t know whether there is a problem with my usage.

Here is the way I used:

model = CNN_GRU()
model.eval()
list_mix = [['...', '...']]
# dynamic
model = torch.quantization.quantize_dynamic(
    model, {nn.GRU, nn.Linear}, dtype=torch.qint8)

# static
model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.backends.quantized.engine = 'fbgemm'
model_fp32_fused = torch.quantization.fuse_modules(model, list_mix)
model_fp32_prepared = torch.quantization.prepare(model_fp32_fused)

for _, input in enumerate(loader):
     video = input.get('video')
     border = input.get('duration').float()
     model_fp32_prepared(video, border)
model_int8 = torch.quantization.convert(model_fp32_prepared)

# inference
pred = model_int8 (video, border)

Besides, for some ops doesn’t support yet like maxpool3d, i used dequant() in the forward().

class Model():
    def __init__(self):
         ...
         self.pool = nn.MaxPool3d()
         ...
         self.bn = nn.BatchNorm1d()
         ...
        self.quant = QuantStub()
        self.dequant = DeQuantStub()
     def forward(self, x):
         x = self.quant(x)
         ...
         x = self.dequant(x)
         x = self.pool(x)
         x = self.quant(x)
         ...
         x = self.dequant(x)
         x = self.bn(x)
         x = self.quant(x)
         ...
         x = self.dequant(x)
         return x

After doing this, the code could run, and the size of the model decreased by about 3/4, but the ACC is nearly decreased to 2%.