I want to implement quantized network in pure C. One of the purposes is to get full understanding on how the operations with quantized tensors work.
I made PQT with Renset-18 architecture and got good accuracy with fbgemm backend.
Now I am struggling to replicate the operations. I decided that the simplest to start is addition in Renset block.
self.skip_add = nn.quantized.FloatFunctional()
And during inference I can add two tensors via
out1 = self.skip_add.add(x1, x2)
where x1 and x2 are tensors of torch.Tensor type, quantized with fbgemm backend during post training quantization procedure.
I expected out2_int = x1.int_repr() + x2.int_repr()
should be the same as out1.int_repr()
(with probably need of clamping in the needed range). However that is not the case.
Can anyone please provide me with any information on how to implement operations with quantized tensors?
Below I dump the example outputs.
print(x1)
...,
[-0.0596, -0.0496, -0.1390, ..., -0.0596, -0.0695, -0.0099],
[-0.0893, 0.0000, -0.0695, ..., 0.0596, -0.0893, -0.0298],
[-0.1092, 0.0099, 0.0000, ..., -0.0397, -0.0794, -0.0199]]]],
size=(1, 256, 14, 14), dtype=torch.quint8,
quantization_scheme=torch.per_tensor_affine, scale=0.009925744496285915,
zero_point=75)
print(x2)
...,
[ 0.1390, -0.1669, -0.0278, ..., -0.2225, -0.0556, -0.1112],
[ 0.0000, -0.1669, -0.0556, ..., 0.0556, 0.1112, -0.2781],
[ 0.1390, 0.1669, 0.0278, ..., 0.2225, 0.4171, 0.0834]]]],
size=(1, 256, 14, 14), dtype=torch.quint8,
quantization_scheme=torch.per_tensor_affine, scale=0.02780967578291893,
zero_point=61)
print(x1.int_repr())
...,
[69, 70, 61, ..., 69, 68, 74],
[66, 75, 68, ..., 81, 66, 72],
[64, 76, 75, ..., 71, 67, 73]]]], dtype=torch.uint8)
print(x2.int_repr())
...,
[66, 55, 60, ..., 53, 59, 57],
[61, 55, 59, ..., 63, 65, 51],
[66, 67, 62, ..., 69, 76, 64]]]], dtype=torch.uint8)
print(self.skip_add.add(x1, x2))
...,
[ 0.0904, -0.2109, -0.1808, ..., -0.2712, -0.1205, -0.1205],
[-0.0904, -0.1808, -0.1205, ..., 0.1205, 0.0301, -0.3013],
[ 0.0301, 0.1808, 0.0301, ..., 0.1808, 0.3314, 0.0603]]]],
size=(1, 256, 14, 14), dtype=torch.quint8,
quantization_scheme=torch.per_tensor_affine, scale=0.03012925386428833,
zero_point=56)
print(self.skip_add.add(x1, x2).int_repr())
...,
[59, 49, 50, ..., 47, 52, 52],
[53, 50, 52, ..., 60, 57, 46],
[57, 62, 57, ..., 62, 67, 58]]]], dtype=torch.uint8)
print(x1.int_repr() + x2.int_repr())
[135, 125, 121, ..., 122, 127, 131],
[127, 130, 127, ..., 144, 131, 123],
[130, 143, 137, ..., 140, 143, 137]]]], dtype=torch.uint8)