Greetings. I have gone through two quantization attempts for resnet50 that comes with pytorch and had mixed results:
dynamic quantization works but is limited to the only Linear layer used in ResNet, thus the resulting improvements in model size and inference latency are just a few percent.
static quantization nominally succeeds, but at runtime the new model throws the exception described in Supported quantized tensor operations, which I presume is caused by the “+” operation used to implement skip connections. It doesn’t seem feasible to exclude those as they repeat throughout the entire depth of the model. Am I correct in deducing then that the resnet implementation that ships with pytorch cannot be (correctly) statically quantized by the current API?
I understand that quantization support is marked experimental – I’d like to confirm that the limitations I am seeing are expected at this stage.