Hello everyone,
I am quantizing the retinanet using standard methods of Pytorch, namely PTQ and QAT and got a great results. The model size has been reduced from 139MB to 39MB and Inference time on cpu from 90min to 20min for a big valid dataset by accuracy loss smaller that 1%. Thus, although the results are great, I tried to check the weights of the quantized network and found out, if I use
print(model.head.cls_subnet[0].conv.weight().int_repr())
I get a really quantized integer tensor like
[[-33, 6, -56],
[-36, 47, 24],
[ 12, 1, 25]],
[[-22, 18, 22],
[-45, 43, -55],
[ 4, 1, -58]],
[[ 19, 27, 10],
[-73, 9, -53],
[ 2, -38, -24]]]], dtype=torch.int8)
But if I access a weight without int_repr()
print(model.head.cls_subnet[0].conv.weight())
I get a tensor like
[[-0.0096, 0.0017, -0.0163],
[-0.0105, 0.0137, 0.0070],
[ 0.0035, 0.0003, 0.0073]],
[[-0.0064, 0.0052, 0.0064],
[-0.0131, 0.0125, -0.0160],
[ 0.0012, 0.0003, -0.0169]],
[[ 0.0055, 0.0079, 0.0029],
[-0.0212, 0.0026, -0.0154],
[ 0.0006, -0.0111, -0.0070]]]], size=(256, 256, 3, 3),
dtype=torch.qint8, quantization_scheme=torch.per_channel_affine,
scale=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003,
So the question is: Was the quantitation done correctly or I still use the full-precision weights? Why does it look like? Is it an intern representations of weights?
Thank you in advance.
Best regards,
yayapa