Hi all, I just started learning model quantification.
The model I use is divided into front-end(based on CNN) and back-end(based on GRU), so I used static quantization and dynamic quantization. The ACC will decrease a little if using dynamic quantization which seems like a correct result. But coupled with static quantization, the accuracy of the model will decline significantly. I don’t know whether there is a problem with my usage.
Here is the way I used:
model = CNN_GRU()
model.eval()
list_mix = [['...', '...']]
# dynamic
model = torch.quantization.quantize_dynamic(
model, {nn.GRU, nn.Linear}, dtype=torch.qint8)
# static
model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.backends.quantized.engine = 'fbgemm'
model_fp32_fused = torch.quantization.fuse_modules(model, list_mix)
model_fp32_prepared = torch.quantization.prepare(model_fp32_fused)
for _, input in enumerate(loader):
video = input.get('video')
border = input.get('duration').float()
model_fp32_prepared(video, border)
model_int8 = torch.quantization.convert(model_fp32_prepared)
# inference
pred = model_int8 (video, border)
Besides, for some ops doesn’t support yet like maxpool3d, i used dequant() in the forward().
class Model():
def __init__(self):
...
self.pool = nn.MaxPool3d()
...
self.bn = nn.BatchNorm1d()
...
self.quant = QuantStub()
self.dequant = DeQuantStub()
def forward(self, x):
x = self.quant(x)
...
x = self.dequant(x)
x = self.pool(x)
x = self.quant(x)
...
x = self.dequant(x)
x = self.bn(x)
x = self.quant(x)
...
x = self.dequant(x)
return x
After doing this, the code could run, and the size of the model decreased by about 3/4, but the ACC is nearly decreased to 2%.