Hi there!
I am currently trying to quantize an EfficientNet MultiHead model (from timm) using the Post Training Static quantization approach mentioned in the PyTorch documentation (Eager Mode).
Unfortunately, the model’s performance decreases significantly after being quantized (90% accuracy to 49%).
For the sake of simplicity, I am only showing a SingleHead version of the model (a few of the InvertedResidual layers are cut out due to character limits on the post, indicated by “…”), however it also performs very poorly.
EffNet_multihead(
(features): Sequential(
(0): Conv2dSame(3, 24, kernel_size=(3, 3), stride=(2, 2), bias=False)
(1): QuantStub()
(2): BatchNorm2d(24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(3): DeQuantStub()
(4): SiLU(inplace=True)
(5): Sequential(
(0): ConvBnAct(
(conv): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(1): ConvBnAct(
(conv): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
)
(6): Sequential(
(0): EdgeResidual(
(conv_exp): Conv2dSame(24, 96, kernel_size=(3, 3), stride=(2, 2), bias=False)
(bn1): BatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(se): Identity()
(conv_pwl): Conv2d(96, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn2): BatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(1): EdgeResidual(
(conv_exp): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(se): Identity()
(conv_pwl): Conv2d(192, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn2): BatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(2): EdgeResidual(
(conv_exp): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(se): Identity()
(conv_pwl): Conv2d(192, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn2): BatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(3): EdgeResidual(
(conv_exp): Conv2d(48, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(se): Identity()
(conv_pwl): Conv2d(192, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn2): BatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
)
(7): Sequential(
(0): EdgeResidual(
(conv_exp): Conv2dSame(48, 192, kernel_size=(3, 3), stride=(2, 2), bias=False)
(bn1): BatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(se): Identity()
(conv_pwl): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(1): EdgeResidual(
(conv_exp): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(se): Identity()
(conv_pwl): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(2): EdgeResidual(
(conv_exp): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(se): Identity()
(conv_pwl): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(3): EdgeResidual(
(conv_exp): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(se): Identity()
(conv_pwl): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
)
(8): Sequential(
(0): InvertedResidual(
(conv_pw): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2dSame(256, 256, kernel_size=(3, 3), stride=(2, 2), groups=256, bias=False)
(bn2): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(256, 16, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(16, 256, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(1): InvertedResidual(
(conv_pw): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
(bn2): BatchNorm2d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(512, 32, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(32, 512, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(2): InvertedResidual(
(conv_pw): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
(bn2): BatchNorm2d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(512, 32, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(32, 512, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(3): InvertedResidual(
(conv_pw): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512, bias=False)
(bn2): BatchNorm2d(512, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(512, 32, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(32, 512, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(128, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
...
(6): InvertedResidual(
(conv_pw): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
(bn2): BatchNorm2d(960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(960, 40, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(40, 960, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(7): InvertedResidual(
(conv_pw): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
(bn2): BatchNorm2d(960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(960, 40, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(40, 960, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(8): InvertedResidual(
(conv_pw): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
(bn2): BatchNorm2d(960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(960, 40, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(40, 960, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(160, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
)
)
(head0): EffNet_head(
(features): Sequential(
(0): InvertedResidual(
(conv_pw): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2dSame(960, 960, kernel_size=(3, 3), stride=(2, 2), groups=960, bias=False)
(bn2): BatchNorm2d(960, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(960, 40, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(40, 960, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(960, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(1): InvertedResidual(
(conv_pw): Conv2d(256, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536, bias=False)
(bn2): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(1536, 64, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(64, 1536, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(2): InvertedResidual(
(conv_pw): Conv2d(256, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536, bias=False)
(bn2): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(1536, 64, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(64, 1536, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(3): InvertedResidual(
(conv_pw): Conv2d(256, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536, bias=False)
(bn2): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(1536, 64, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(64, 1536, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(4): InvertedResidual(
(conv_pw): Conv2d(256, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536, bias=False)
(bn2): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(1536, 64, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(64, 1536, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(5): InvertedResidual(
(conv_pw): Conv2d(256, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536, bias=False)
(bn2): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(1536, 64, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(64, 1536, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(6): InvertedResidual(
(conv_pw): Conv2d(256, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536, bias=False)
(bn2): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(1536, 64, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(64, 1536, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(7): InvertedResidual(
(conv_pw): Conv2d(256, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536, bias=False)
(bn2): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(1536, 64, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(64, 1536, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(8): InvertedResidual(
(conv_pw): Conv2d(256, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536, bias=False)
(bn2): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(1536, 64, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(64, 1536, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(9): InvertedResidual(
(conv_pw): Conv2d(256, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536, bias=False)
(bn2): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(1536, 64, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(64, 1536, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(10): InvertedResidual(
(conv_pw): Conv2d(256, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536, bias=False)
(bn2): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(1536, 64, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(64, 1536, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(11): InvertedResidual(
(conv_pw): Conv2d(256, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536, bias=False)
(bn2): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(1536, 64, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(64, 1536, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(12): InvertedResidual(
(conv_pw): Conv2d(256, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536, bias=False)
(bn2): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(1536, 64, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(64, 1536, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(13): InvertedResidual(
(conv_pw): Conv2d(256, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536, bias=False)
(bn2): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(1536, 64, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(64, 1536, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
(14): InvertedResidual(
(conv_pw): Conv2d(256, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act1): SiLU(inplace=True)
(conv_dw): Conv2d(1536, 1536, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1536, bias=False)
(bn2): BatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(act2): SiLU(inplace=True)
(se): SqueezeExcite(
(conv_reduce): Conv2d(1536, 64, kernel_size=(1, 1), stride=(1, 1))
(act1): SiLU(inplace=True)
(conv_expand): Conv2d(64, 1536, kernel_size=(1, 1), stride=(1, 1))
(gate): Sigmoid()
(quant): QuantStub()
(dequant): DeQuantStub()
)
(conv_pwl): Conv2d(1536, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
)
(conv_head): Conv2d(256, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn2): BatchNorm2d(1280, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): ReLU(inplace=True)
(drop): Dropout(p=0.5, inplace=False)
(global_pool): SelectAdaptivePool2d (pool_type=avg, flatten=Flatten(start_dim=1, end_dim=-1))
(classifier): Linear(in_features=1280, out_features=2, bias=True)
(quant): QuantStub()
(dequant): DeQuantStub()
)
)
Please ignore the order that the quant and dequant layers are shown here, in the forward pass of each block there are ordered correctly.
I have tried only quantizing parts of the model and have noticed that the largest performance decrease comes from the InvertedResidual blocks.
I suspect that this is due to how deep the network is and might require Quantization Aware Training, but I am not sure.
To quantize the model once it is generated, I run the following:
torch.quantization.prepare(model, inplace=True)
model.eval()
model.to('cpu')
I then pass a sample dataset through it to calibrate it, and run
model = torch.quantization.convert(model)
I know that in the code above I have not done any fusion of layers; however, the performance is still very poor with fusion so I am trying to resolve the issue without it.
Is there anything I am doing that might stand out as incorrect? Please let me know if there is any information missing here.
PyTorch version: 1.10.1
Torchvision version: 0.11.2
Thank you.