Hi, all.
I have some questions about the quantization.
When I quantize a model, the following results is obtained.
I know that scale and zero_point in torch,per_channel_affine can make int8 weight.
Then, what do fc1.scale and fc1.zero_point mean?
Also, I wonder what zero_point and scale are used to quantize the input(activation).
model.qconfig = torch.quantization.get_default_qat_qconfig("fbgemm")
model_q = torch.quantization.prepare_qat(model, inplace=False)
model_q.load_state_dict(torch.load('220307_quant_mnist.pt'))
model.eval().to('cpu')
model_int8 = torch.quantization.convert(model_q, inplace=False)
torch.save(model_int8.state_dict(), './220307_quantized_mnist.pt')
test()
params = torch.load('220307_quantized_mnist.pt', map_location = "cpu")
print(params)
print(params['fc1._packed_params._packed_params'][0].int_repr())
OrderedDict ([('fc1.scale', tensor(8.2687topic deleted by author)), ('fc1.zero_point', tensor(44)), ('fc1._packed_params.dtype', torch.qint8), ('fc1._packed_params._packed_params', (tensor([[-0.5753, -0.6056, -0.6056, ..., -0.5753, -0.6056, -0.6056],
[ 0.0289, 0.0289, 0.0000, ..., 0.0867, 0.0867, 0.0289],
[ 0.2502, 0.0834, 0.0556, ..., 0.0556, 0.1390, 0.0556],
...,
[-0.0295, -0.0295, 0.0000, ..., 0.0000, 0.0000, -0.0295],
[-0.0677, 0.0000, -0.0677, ..., -0.0338, 0.0000, -0.0677],
[ 0.0000, 0.0000, 0.0625, ..., 0.0937, 0.0937, 0.0000]],
size=(512, 784), dtype=torch.qint8,
quantization_scheme=torch.per_channel_affine,
scale=tensor([0.0303, 0.0289, 0.0278, 0.0299, 0.0323, 0.0261, 0.0319, 0.0226, 0.0196,
0.0299, 0.0369, 0.0324, 0.0264, 0.0330, 0.0228, 0.0257, 0.0376, 0.0149,
0.0333, 0.0402, 0.0261, 0.0377, 0.0397, 0.0280, 0.0422, 0.0287, 0.0244,
0.0251, 0.0341, 0.0225, 0.0246, 0.0277, 0.0177, 0.0381, 0.0314, 0.0341,
0.0274, 0.0269, 0.0248, 0.0344, 0.0365, 0.0262, 0.0267, 0.0302, 0.0271,
0.0299, 0.0246, 0.0228, 0.0263, 0.0267, 0.0329, 0.0165, 0.0367, 0.0304,