Shall I remove the BN and ReLU in C progress?

mao · December 2, 2023, 9:09am

As I defined my model as following:
def forward(self, x):
x = self.quant(x)
x = self.conv1(x)
x = self.bn1(x)
x = self.act1(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.act2(x)
x = self.conv3(x)
x = self.dequant(x)
return x
and I set all Conv layers’ bias to False.
After the model been fused and per_channel quantized the model:
UNetSimplified(
(conv1): QuantizedConvReLU2d(1, 8, kernel_size=(3, 3), stride=(1, 1), scale=0.13664968311786652, zero_point=0, padding=(1, 1))
(bn1): Identity()
(act1): Identity()
(conv2): QuantizedConvReLU2d(8, 8, kernel_size=(3, 3), stride=(1, 1), scale=0.026323270052671432, zero_point=0, padding=(1, 1))
(bn2): Identity()
(act2): Identity()
(conv3): QuantizedConv2d(8, 1, kernel_size=(3, 3), stride=(1, 1), scale=0.4080895185470581, zero_point=111, padding=(1, 1), bias=False)
(quant): Quantize(scale=tensor([10.0395]), zero_point=tensor([22]), dtype=torch.quint8)
(dequant): DeQuantize()
)

I got the per_channel quantized model state dict :
OrderedDict([(‘conv1.weight’, tensor([[[[-0.0070, 0.0028, -0.0038],
[ 0.0033, 0.0013, -0.0037],
[-0.0003, 0.0009, -0.0036]]],

    [[[ 0.0067,  0.0057, -0.0076],
      [-0.0008,  0.0063, -0.0086],
      [ 0.0045,  0.0060,  0.0061]]],


    [[[ 0.0026, -0.0024, -0.0045],
      [-0.0042,  0.0017,  0.0019],
      [-0.0104, -0.0039,  0.0080]]],


    [[[-0.0110, -0.0009,  0.0028],
      [-0.0027,  0.0006,  0.0079],
      [ 0.0071,  0.0073,  0.0003]]],


    [[[ 0.0076,  0.0054,  0.0020],
      [ 0.0002, -0.0012, -0.0043],
      [ 0.0024,  0.0003, -0.0063]]],


    [[[ 0.0054, -0.0210, -0.0016],
      [-0.0036, -0.0062, -0.0085],
      [-0.0007, -0.0003, -0.0039]]],


    [[[-0.0018, -0.0002, -0.0006],
      [ 0.0004,  0.0013,  0.0010],
      [ 0.0009,  0.0006, -0.0004]]],


    [[[-0.0006, -0.0040, -0.0030],
      [-0.0035, -0.0028,  0.0012],
      [-0.0013,  0.0040,  0.0029]]]], size=(8, 1, 3, 3), dtype=torch.qint8,
   quantization_scheme=torch.per_channel_affine,
   scale=tensor([5.4946e-05, 6.6890e-05, 8.1454e-05, 8.6305e-05, 5.9863e-05, 1.6422e-04,
    1.3998e-05, 3.1696e-05], dtype=torch.float64),
   zero_point=tensor([0, 0, 0, 0, 0, 0, 0, 0]), axis=0)), ('conv1.bias', Parameter containing:

tensor([ 5.8286, 1.4856, 0.2724, 0.0426, 0.3071, -0.2796, 8.4139, 1.3693],
requires_grad=True)), (‘conv1.scale’, tensor(0.1366)), (‘conv1.zero_point’, tensor(0)), (‘conv2.weight’, tensor([[[[ 0.0150, 0.0986, 0.1136],
[ 0.0064, 0.1039, 0.0300],
[-0.0032, 0.0075, 0.0236]],

     [[-0.0332,  0.0246, -0.0043],
      [-0.0879, -0.0075, -0.0214],
      [ 0.0032,  0.0375,  0.0289]],

     [[ 0.0246,  0.0697, -0.0386],
      [ 0.0289,  0.0375,  0.0246],
      [ 0.0547,  0.0150,  0.0632]],

     [[ 0.0021,  0.0075, -0.0043],
      [-0.0868, -0.0793, -0.1361],
      [ 0.0246,  0.0589, -0.0150]],

     [[-0.1168,  0.0086, -0.0193],
      [-0.0139, -0.0375,  0.0311],
      [-0.0321, -0.0289,  0.0493]],

     [[ 0.0332,  0.0129,  0.0204],
      [ 0.0150, -0.0364, -0.0129],
      [-0.0021,  0.0032,  0.0343]],

     [[-0.1168, -0.0107,  0.0150],
      [-0.1275,  0.0804,  0.0493],
      [-0.0729,  0.0118,  0.0118]],

     [[-0.0643, -0.0171,  0.0450],
      [-0.0397, -0.0343, -0.0279],
      [-0.0064, -0.0407,  0.0021]]],


    [[[ 0.0041,  0.0061,  0.0143],
      [-0.0010, -0.0143, -0.0266],
      [ 0.0583,  0.0481, -0.0665]],

     [[ 0.0389, -0.0624, -0.0430],
      [-0.0102,  0.0000,  0.0194],
      [-0.0553, -0.0235, -0.0143]],

     [[ 0.0051, -0.0829,  0.0317],
      [-0.0225, -0.0860, -0.0225],
      [ 0.0389,  0.0102,  0.1157]],

     [[-0.0082, -0.0102,  0.0358],
      [ 0.0133, -0.0123, -0.0246],
      [-0.0072, -0.1013,  0.0041]],

     [[ 0.0051, -0.0420, -0.0583],
      [-0.0246, -0.0174,  0.0225],
      [-0.0051, -0.0031, -0.0123]],

     [[ 0.0143, -0.0061,  0.1177],
      [-0.0532, -0.1310, -0.1085],
      [-0.0553, -0.1228, -0.0676]],

     [[-0.0164, -0.0061, -0.0676],
      [ 0.0205,  0.0542, -0.0276],
      [-0.0041,  0.0113,  0.0051]],

     [[-0.0225,  0.0522,  0.0727],
      [-0.0246,  0.0143,  0.0420],
      [-0.0491, -0.0450, -0.0471]]],


    [[[-0.0068,  0.0155,  0.0429],
      [-0.0261, -0.0199,  0.0168],
      [-0.0224, -0.0267,  0.0050]],

     [[-0.0075,  0.0062,  0.0186],
      [-0.0143,  0.0012,  0.0112],
      [-0.0373,  0.0000,  0.0112]],

     [[ 0.0143, -0.0522, -0.0211],
      [ 0.0454,  0.0093,  0.0609],
      [ 0.0168, -0.0348,  0.0019]],

     [[ 0.0099,  0.0068, -0.0137],
      [ 0.0162,  0.0292,  0.0323],
      [-0.0025,  0.0025,  0.0186]],

     [[-0.0255, -0.0205, -0.0217],
      [-0.0149, -0.0758, -0.0441],
      [-0.0050, -0.0155, -0.0068]],

     [[-0.0280,  0.0236,  0.0510],
      [-0.0311, -0.0503, -0.0317],
      [ 0.0075, -0.0273, -0.0217]],

     [[-0.0168, -0.0062,  0.0137],
      [-0.0360,  0.0261,  0.0298],
      [-0.0242,  0.0298,  0.0174]],

     [[-0.0391,  0.0000,  0.0168],
      [-0.0789, -0.0323,  0.0025],
      [-0.0410,  0.0292,  0.0404]]],


    [[[ 0.0144,  0.0108,  0.0156],
      [ 0.0057,  0.0132,  0.0183],
      [-0.0303, -0.0063, -0.0078]],

     [[ 0.0087, -0.0255, -0.0132],
      [ 0.0156, -0.0162, -0.0078],
      [ 0.0303,  0.0051, -0.0069]],

     [[ 0.0051, -0.0207, -0.0384],
      [ 0.0213,  0.0216, -0.0201],
      [ 0.0120,  0.0162,  0.0120]],

     [[ 0.0018,  0.0006, -0.0072],
      [-0.0006, -0.0039, -0.0198],
      [ 0.0129,  0.0021, -0.0216]],

     [[ 0.0195, -0.0150, -0.0180],
      [ 0.0348, -0.0039,  0.0033],
      [ 0.0261,  0.0204,  0.0048]],

     [[-0.0045, -0.0048, -0.0075],
      [ 0.0027,  0.0015, -0.0060],
      [ 0.0078,  0.0039,  0.0024]],

     [[-0.0141, -0.0189, -0.0234],
      [ 0.0003,  0.0060,  0.0237],
      [ 0.0063,  0.0156,  0.0267]],

     [[ 0.0201,  0.0261,  0.0033],
      [ 0.0066,  0.0096,  0.0072],
      [ 0.0078,  0.0096,  0.0120]]],


    [[[-0.0063,  0.0288,  0.0253],
      [-0.0040, -0.0069,  0.0127],
      [-0.0132, -0.0075, -0.0357]],

     [[ 0.0190, -0.0046, -0.0035],
      [ 0.0063, -0.0081, -0.0092],
      [-0.0086,  0.0098,  0.0040]],

     [[ 0.0023, -0.0737,  0.0150],
      [-0.0230,  0.0006, -0.0132],
      [-0.0213, -0.0058, -0.0178]],

     [[-0.0173,  0.0023,  0.0098],
      [-0.0345, -0.0127,  0.0029],
      [-0.0104, -0.0040,  0.0069]],

     [[ 0.0092, -0.0006, -0.0092],
      [ 0.0213,  0.0023, -0.0213],
      [ 0.0178, -0.0035, -0.0155]],

     [[-0.0224, -0.0121,  0.0437],
      [-0.0294, -0.0144, -0.0046],
      [ 0.0201, -0.0173, -0.0345]],

     [[-0.0190, -0.0063, -0.0201],
      [-0.0035,  0.0248, -0.0012],
      [-0.0224, -0.0092, -0.0178]],

     [[-0.0207,  0.0144,  0.0196],
      [-0.0115, -0.0029, -0.0006],
      [ 0.0040,  0.0035, -0.0086]]],


    [[[ 0.0007,  0.0355,  0.0586],
      [ 0.0020,  0.0020,  0.0123],
      [-0.0089, -0.0252, -0.0150]],

     [[-0.0143,  0.0027, -0.0170],
      [-0.0211,  0.0048,  0.0102],
      [-0.0389,  0.0041,  0.0184]],

     [[ 0.0157, -0.0559, -0.0211],
      [ 0.0191, -0.0075, -0.0341],
      [-0.0123,  0.0286, -0.0334]],

     [[ 0.0109,  0.0048, -0.0116],
      [ 0.0014,  0.0007, -0.0157],
      [-0.0409, -0.0552, -0.0804]],

     [[-0.0334, -0.0409, -0.0873],
      [-0.0450, -0.0280, -0.0170],
      [-0.0157,  0.0266,  0.0689]],

     [[ 0.0061, -0.0143, -0.0409],
      [-0.0191, -0.0191, -0.0225],
      [-0.0348, -0.0266, -0.0525]],

     [[-0.0157, -0.0232, -0.0102],
      [-0.0055, -0.0143,  0.0232],
      [ 0.0034,  0.0157,  0.0423]],

     [[ 0.0055, -0.0245, -0.0423],
      [-0.0239, -0.0170, -0.0273],
      [-0.0136, -0.0075, -0.0198]]],


    [[[-0.0227, -0.0018, -0.0141],
      [ 0.0135,  0.0221,  0.0129],
      [ 0.0184,  0.0663, -0.0012]],

     [[ 0.0117,  0.0215,  0.0111],
      [-0.0104,  0.0000, -0.0295],
      [ 0.0080,  0.0049, -0.0154]],

     [[ 0.0012, -0.0313,  0.0209],
      [-0.0068, -0.0381, -0.0012],
      [-0.0160,  0.0289,  0.0319]],

     [[ 0.0018, -0.0172,  0.0049],
      [ 0.0117,  0.0000, -0.0129],
      [ 0.0399,  0.0270,  0.0049]],

     [[ 0.0344,  0.0227,  0.0264],
      [-0.0455, -0.0313, -0.0326],
      [-0.0375, -0.0332, -0.0166]],

     [[-0.0614, -0.0252, -0.0381],
      [-0.0037, -0.0184, -0.0338],
      [ 0.0061, -0.0233,  0.0037]],

     [[ 0.0129,  0.0166, -0.0448],
      [ 0.0227,  0.0246, -0.0780],
      [-0.0043, -0.0049, -0.0571]],

     [[-0.0018,  0.0154,  0.0031],
      [ 0.0160,  0.0178,  0.0068],
      [ 0.0080,  0.0031,  0.0043]]],


    [[[-0.0203, -0.0257, -0.0095],
      [ 0.0025,  0.0435, -0.0091],
      [-0.0029,  0.0012,  0.0029]],

     [[-0.0046,  0.0062,  0.0219],
      [-0.0178, -0.0356, -0.0174],
      [-0.0228, -0.0306,  0.0075]],

     [[ 0.0021,  0.0211,  0.0157],
      [-0.0335, -0.0530, -0.0025],
      [ 0.0211, -0.0137, -0.0145]],

     [[ 0.0162,  0.0211,  0.0203],
      [-0.0079,  0.0062, -0.0315],
      [-0.0004, -0.0058, -0.0079]],

     [[ 0.0199,  0.0037,  0.0207],
      [-0.0240, -0.0261,  0.0062],
      [-0.0190, -0.0029,  0.0066]],

     [[ 0.0203,  0.0327,  0.0075],
      [ 0.0315,  0.0186,  0.0174],
      [ 0.0050,  0.0029,  0.0170]],

     [[-0.0033, -0.0046, -0.0162],
      [ 0.0054,  0.0298, -0.0087],
      [ 0.0066,  0.0070, -0.0017]],

     [[-0.0352, -0.0157, -0.0145],
      [-0.0033,  0.0149,  0.0141],
      [-0.0025,  0.0141,  0.0248]]]], size=(8, 8, 3, 3), dtype=torch.qint8,
   quantization_scheme=torch.per_channel_affine,
   scale=tensor([0.0011, 0.0010, 0.0006, 0.0003, 0.0006, 0.0007, 0.0006, 0.0004],
   dtype=torch.float64),
   zero_point=tensor([0, 0, 0, 0, 0, 0, 0, 0]), axis=0)), ('conv2.bias', Parameter containing:

tensor([-0.3632, 1.0700, -0.6545, -0.3776, 0.8916, 0.6928, 0.6020, 0.1993],
requires_grad=True)), (‘conv2.scale’, tensor(0.0263)), (‘conv2.zero_point’, tensor(0)), (‘conv3.weight’, tensor([[[[-0.9800, -1.8201, -2.0721],
[-1.9041, -2.5761, -2.2681],
[-1.8761, -2.9961, -2.1561]],

     [[-0.6160, -1.9601, -1.7361],
      [-1.0080, -1.8201, -1.5121],
      [-1.4841, -1.5401, -1.7361]],

     [[ 0.9520,  1.7921, -1.7641],
      [ 1.2041,  0.1120, -1.3441],
      [-2.4921, -1.2601, -3.0522]],

     [[-1.5681, -1.5121,  0.1400],
      [-3.5842, -1.0641,  0.9520],
      [ 0.2800, -1.4841,  0.4760]],

     [[-0.9240, -2.9961, -1.1761],
      [-2.5481, -1.6801, -0.5600],
      [-0.4760, -1.9321,  0.9800]],

     [[ 0.5880,  0.7840,  0.7560],
      [ 0.1960,  0.3640,  0.5600],
      [ 0.9800,  0.3360,  0.7280]],

     [[-1.3441, -0.1960,  0.5600],
      [-2.0441, -1.0361, -1.9321],
      [-0.9520, -0.5320, -1.4841]],

     [[-3.0241, -2.0441, -1.7081],
      [-0.0280,  0.9240, -2.4361],
      [-0.1960, -0.1120, -2.4361]]]], size=(1, 8, 3, 3), dtype=torch.qint8,
   quantization_scheme=torch.per_channel_affine,
   scale=tensor([0.0280], dtype=torch.float64), zero_point=tensor([0]),
   axis=0)), ('conv3.bias', None), ('conv3.scale', tensor(0.4081)), ('conv3.zero_point', tensor(111)), ('quant.scale', tensor([10.0395])), ('quant.zero_point', tensor([22]))])

Finally, I got quantized conv weight of first conv layer (convert to numpy) like:
[[[[-128 51 -69], [ 60 24 -68], [ -5 16 -65]]], [[[ 100 85 -113], [ -12 94 -128], [ 68 89 91]]], [[[ 32 -29 -55], [ -51 21 23], [-128 -48 98]]], [[[-128 -10 32], [ -31 7 91], [ 82 85 3]]], it can be easily computed from the float weight, scale and zero point.
What confused is that we can see there is float bias value in first and the second fused conv layers, and I guess it’s comes from the progress of the operation which fusing the [conv, bn, relu].

Is that float bias in fused conv layer still behave same as the original nn.Conv2D ? ( In my opinion, all the value involve in inference process was in qint8, how is the float bias participate in calculations? );
What’s more, Does the (‘conv1.scale’, tensor(0.1366)), (‘conv1.zero_point’, tensor(0)) is the qparam of ReLU? How does it computed in fused model?

As Supplementary, before quantization, I defined the C code(as the forward function defined: Conv, bn, relu, conv, bn, relu, conv) for the inference progress. Shall I remove the BN and ReLU function while they are fused in ConvReLU2d layer?
Thanks a lot if there is any help.

HDCharles · December 5, 2023, 2:10am

answered here: How is quantization of activations handled in pytorch after QAT? - #12 by HDCharles