Hi @jerryzh168,
Thanks for helping out.
- I tried running my model on my local system (AMD64 processor, OS:Ubuntu 20.04, Conda environment with pytorch 1.9.1 version) as well as Google Collab instance. The results are similar in both cases.
- I am using the
fbgemm
config parameter when quantizing the model. The output model size is ~4x less in size and achieves same accuracy test set compared to the float model, but no inference speed improvement.
This is the output when I print the quantized model.
deeplabv3_cityScapes(
(backbone): DeepLabV3(
(backbone): IntermediateLayerGetter(
(conv1): QuantizedConvReLU2d(3, 64, kernel_size=(7, 7), stride=(2, 2), scale=0.09285419434309006, zero_point=0, padding=(3, 3))
(bn1): Identity()
(relu): Identity()
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
(conv1): QuantizedConv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), scale=0.14764662086963654, zero_point=77)
(bn1): Identity()
(conv2): QuantizedConv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), scale=0.32371971011161804, zero_point=56, padding=(1, 1))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(64, 256, kernel_size=(1, 1), stride=(1, 1), scale=0.24840933084487915, zero_point=0)
(bn3): Identity()
(relu): Identity()
(downsample): Sequential(
(0): QuantizedConv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), scale=0.13252848386764526, zero_point=69)
(1): Identity()
)
(skip_add): QFunctional(
scale=0.3724604547023773, zero_point=27
(activation_post_process): Identity()
)
)
(1): Bottleneck(
(conv1): QuantizedConv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), scale=0.4247148036956787, zero_point=55)
(bn1): Identity()
(conv2): QuantizedConv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), scale=0.4553159773349762, zero_point=67, padding=(1, 1))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(64, 256, kernel_size=(1, 1), stride=(1, 1), scale=0.25678393244743347, zero_point=0)
(bn3): Identity()
(relu): Identity()
(skip_add): QFunctional(
scale=0.5840356945991516, zero_point=23
(activation_post_process): Identity()
)
)
(2): Bottleneck(
(conv1): QuantizedConv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), scale=0.42139047384262085, zero_point=71)
(bn1): Identity()
(conv2): QuantizedConv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), scale=0.4264393150806427, zero_point=54, padding=(1, 1))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(64, 256, kernel_size=(1, 1), stride=(1, 1), scale=0.19536954164505005, zero_point=0)
(bn3): Identity()
(relu): Identity()
(skip_add): QFunctional(
scale=0.509103000164032, zero_point=19
(activation_post_process): Identity()
)
)
)
(layer2): Sequential(
(0): Bottleneck(
(conv1): QuantizedConv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), scale=0.45401355624198914, zero_point=60)
(bn1): Identity()
(conv2): QuantizedConv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), scale=0.5511052012443542, zero_point=51, padding=(1, 1))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(128, 512, kernel_size=(1, 1), stride=(1, 1), scale=0.44677823781967163, zero_point=0)
(bn3): Identity()
(relu): Identity()
(downsample): Sequential(
(0): QuantizedConv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), scale=0.5842137336730957, zero_point=58)
(1): Identity()
)
(skip_add): QFunctional(
scale=0.9698345065116882, zero_point=37
(activation_post_process): Identity()
)
)
(1): Bottleneck(
(conv1): QuantizedConv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), scale=0.6305021643638611, zero_point=57)
(bn1): Identity()
(conv2): QuantizedConv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), scale=0.5440534949302673, zero_point=63, padding=(1, 1))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(128, 512, kernel_size=(1, 1), stride=(1, 1), scale=0.2438904494047165, zero_point=0)
(bn3): Identity()
(relu): Identity()
(skip_add): QFunctional(
scale=1.0002213716506958, zero_point=35
(activation_post_process): Identity()
)
)
(2): Bottleneck(
(conv1): QuantizedConv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), scale=0.5003912448883057, zero_point=53)
(bn1): Identity()
(conv2): QuantizedConv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), scale=0.5733075141906738, zero_point=56, padding=(1, 1))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(128, 512, kernel_size=(1, 1), stride=(1, 1), scale=0.2752479612827301, zero_point=0)
(bn3): Identity()
(relu): Identity()
(skip_add): QFunctional(
scale=0.9422421455383301, zero_point=35
(activation_post_process): Identity()
)
)
(3): Bottleneck(
(conv1): QuantizedConv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), scale=0.5504205822944641, zero_point=58)
(bn1): Identity()
(conv2): QuantizedConv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), scale=0.721775472164154, zero_point=60, padding=(1, 1))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(128, 512, kernel_size=(1, 1), stride=(1, 1), scale=0.35953018069267273, zero_point=0)
(bn3): Identity()
(relu): Identity()
(skip_add): QFunctional(
scale=1.1053467988967896, zero_point=29
(activation_post_process): Identity()
)
)
)
(layer3): Sequential(
(0): Bottleneck(
(conv1): QuantizedConv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), scale=0.8592106699943542, zero_point=64)
(bn1): Identity()
(conv2): QuantizedConv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), scale=1.3425265550613403, zero_point=59, padding=(1, 1))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), scale=1.0867135524749756, zero_point=0)
(bn3): Identity()
(relu): Identity()
(downsample): Sequential(
(0): QuantizedConv2d(512, 1024, kernel_size=(1, 1), stride=(1, 1), scale=1.2045390605926514, zero_point=55)
(1): Identity()
)
(skip_add): QFunctional(
scale=1.6519306898117065, zero_point=40
(activation_post_process): Identity()
)
)
(1): Bottleneck(
(conv1): QuantizedConv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), scale=1.3347007036209106, zero_point=58)
(bn1): Identity()
(conv2): QuantizedConv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), scale=1.9396781921386719, zero_point=60, padding=(2, 2), dilation=(2, 2))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), scale=1.0596840381622314, zero_point=0)
(bn3): Identity()
(relu): Identity()
(skip_add): QFunctional(
scale=1.878090500831604, zero_point=35
(activation_post_process): Identity()
)
)
(2): Bottleneck(
(conv1): QuantizedConv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), scale=1.13986337184906, zero_point=58)
(bn1): Identity()
(conv2): QuantizedConv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), scale=2.240360736846924, zero_point=48, padding=(2, 2), dilation=(2, 2))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), scale=0.8806281685829163, zero_point=0)
(bn3): Identity()
(relu): Identity()
(skip_add): QFunctional(
scale=2.5012898445129395, zero_point=25
(activation_post_process): Identity()
)
)
(3): Bottleneck(
(conv1): QuantizedConv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), scale=1.59221613407135, zero_point=57)
(bn1): Identity()
(conv2): QuantizedConv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), scale=3.3066182136535645, zero_point=51, padding=(2, 2), dilation=(2, 2))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), scale=1.3484315872192383, zero_point=0)
(bn3): Identity()
(relu): Identity()
(skip_add): QFunctional(
scale=2.731464385986328, zero_point=21
(activation_post_process): Identity()
)
)
(4): Bottleneck(
(conv1): QuantizedConv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), scale=2.1792967319488525, zero_point=58)
(bn1): Identity()
(conv2): QuantizedConv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), scale=3.85483717918396, zero_point=57, padding=(2, 2), dilation=(2, 2))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), scale=3.908252716064453, zero_point=0)
(bn3): Identity()
(relu): Identity()
(skip_add): QFunctional(
scale=5.5938615798950195, zero_point=15
(activation_post_process): Identity()
)
)
(5): Bottleneck(
(conv1): QuantizedConv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), scale=3.8774356842041016, zero_point=63)
(bn1): Identity()
(conv2): QuantizedConv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), scale=7.216492176055908, zero_point=53, padding=(2, 2), dilation=(2, 2))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), scale=3.1228652000427246, zero_point=0)
(bn3): Identity()
(relu): Identity()
(skip_add): QFunctional(
scale=5.101546287536621, zero_point=14
(activation_post_process): Identity()
)
)
)
(layer4): Sequential(
(0): Bottleneck(
(conv1): QuantizedConv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), scale=5.755306720733643, zero_point=88)
(bn1): Identity()
(conv2): QuantizedConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), scale=9.84350299835205, zero_point=50, padding=(2, 2), dilation=(2, 2))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), scale=10.301526069641113, zero_point=0)
(bn3): Identity()
(relu): Identity()
(downsample): Sequential(
(0): QuantizedConv2d(1024, 2048, kernel_size=(1, 1), stride=(1, 1), scale=6.859118461608887, zero_point=70)
(1): Identity()
)
(skip_add): QFunctional(
scale=16.29696273803711, zero_point=32
(activation_post_process): Identity()
)
)
(1): Bottleneck(
(conv1): QuantizedConv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), scale=12.506922721862793, zero_point=62)
(bn1): Identity()
(conv2): QuantizedConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), scale=13.2088041305542, zero_point=64, padding=(4, 4), dilation=(4, 4))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), scale=9.620034217834473, zero_point=0)
(bn3): Identity()
(relu): Identity()
(skip_add): QFunctional(
scale=23.53208351135254, zero_point=31
(activation_post_process): Identity()
)
)
(2): Bottleneck(
(conv1): QuantizedConv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), scale=12.823467254638672, zero_point=68)
(bn1): Identity()
(conv2): QuantizedConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), scale=24.23908233642578, zero_point=63, padding=(4, 4), dilation=(4, 4))
(bn2): Identity()
(conv3): QuantizedConvReLU2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), scale=17.22470474243164, zero_point=0)
(bn3): Identity()
(relu): Identity()
(skip_add): QFunctional(
scale=34.330257415771484, zero_point=21
(activation_post_process): Identity()
)
)
)
)
(classifier): DeepLabHead(
(0): ASPP(
(convs): ModuleList(
(0): Sequential(
(0): QuantizedConvReLU2d(2048, 256, kernel_size=(1, 1), stride=(1, 1), scale=8.72632122039795, zero_point=0)
(1): Identity()
(2): Identity()
)
(1): ASPPConv(
(0): QuantizedConvReLU2d(2048, 256, kernel_size=(3, 3), stride=(1, 1), scale=13.005331993103027, zero_point=0, padding=(12, 12), dilation=(12, 12))
(1): Identity()
(2): Identity()
)
(2): ASPPConv(
(0): QuantizedConvReLU2d(2048, 256, kernel_size=(3, 3), stride=(1, 1), scale=7.3559088706970215, zero_point=0, padding=(24, 24), dilation=(24, 24))
(1): Identity()
(2): Identity()
)
(3): ASPPConv(
(0): QuantizedConvReLU2d(2048, 256, kernel_size=(3, 3), stride=(1, 1), scale=9.811503410339355, zero_point=0, padding=(36, 36), dilation=(36, 36))
(1): Identity()
(2): Identity()
)
(4): ASPPPooling(
(0): AdaptiveAvgPool2d(output_size=1)
(1): QuantizedConvReLU2d(2048, 256, kernel_size=(1, 1), stride=(1, 1), scale=15.513148307800293, zero_point=0)
(2): Identity()
(3): Identity()
)
)
(project): Sequential(
(0): QuantizedConvReLU2d(1280, 256, kernel_size=(1, 1), stride=(1, 1), scale=10.449341773986816, zero_point=0)
(1): Identity()
(2): Identity()
(3): Dropout(p=0.5, inplace=False)
)
)
(1): QuantizedConvReLU2d(256, 256, kernel_size=(3, 3), stride=(1, 1), scale=8.116198539733887, zero_point=0, padding=(1, 1))
(2): Identity()
(3): Identity()
(4): QuantizedConv2d(256, 10, kernel_size=(1, 1), stride=(1, 1), scale=32.9682502746582, zero_point=78)
)
)
(quant): Quantize(scale=tensor([0.0402]), zero_point=tensor([62]), dtype=torch.quint8)
(dequant): DeQuantize()
)
Strangely when I fuse the ReLu layers along with Conv2d and BatchNorm2d layers, the output is full of zeros. When I fuse the Conv2d and BatchNorm2d layers alone, the output is similar to float model.
What do you mean by symbolically traceable model? What are the requirements for a model to be symbolically traceable?
I saw the tutorial earlier on Graph mode Static Quantization but had some doubts. It seems the approach is similar to eager mode quantization but with helper functions prepare_fx
and convert_fx
. Do we need to specify the modules to be fused inside qconfig_dict or the function itself automatically detects the fusable modules?