Greetings. I have gone through two quantization attempts for resnet50 that comes with pytorch and had mixed results:
dynamic quantization works but is limited to the only Linear layer used in ResNet, thus the resulting improvements in model size and inference latency are just a few percent.
static quantization nominally succeeds, but at runtime the new model throws the exception described in Supported quantized tensor operations, which I presume is caused by the “+” operation used to implement skip connections. It doesn’t seem feasible to exclude those as they repeat throughout the entire depth of the model. Am I correct in deducing then that the resnet implementation that ships with pytorch cannot be (correctly) statically quantized by the current API?
I understand that quantization support is marked experimental – I’d like to confirm that the limitations I am seeing are expected at this stage.
RuntimeError: Could not run ‘quantized::conv2d’ with arguments from the ‘CPUTensorId’ backend. ‘quantized::conv2d’ is only available for these backends: [QuantizedCPUTensorId].
Hi all,some layers are missing in the pretrained Resnet50 model.I don’t see Quantstub() and DeQuantStub() .Also,(Skip_add):FloatFunctional (adctivation_post_process):Identity()) is missing after every layer.I guess this is causing inference issues on quantized model and keeps giving this error:
RuntimeError: Could not run ‘quantized::conv2d’ with arguments from the ‘CPUTensorId’ backend. ‘quantized::conv2d’ is only available for these backends: [QuantizedCPUTensorId].
Hi @samhithaaaa, could you share the code/script that you are using to quantize the model?
If you are using fx based quantization you will likely not see QuantStub()/DeQuantStub() in the graph. cc @jerryzh168