Am I correct in concluding that resnet that comes with pytorch can't be quantized by pytorch?

Greetings. I have gone through two quantization attempts for resnet50 that comes with pytorch and had mixed results:

  1. dynamic quantization works but is limited to the only Linear layer used in ResNet, thus the resulting improvements in model size and inference latency are just a few percent.

  2. static quantization nominally succeeds, but at runtime the new model throws the exception described in Supported quantized tensor operations, which I presume is caused by the “+” operation used to implement skip connections. It doesn’t seem feasible to exclude those as they repeat throughout the entire depth of the model. Am I correct in deducing then that the resnet implementation that ships with pytorch cannot be (correctly) statically quantized by the current API?

I understand that quantization support is marked experimental – I’d like to confirm that the limitations I am seeing are expected at this stage.

Thank you.

Incidentally, I can reproduce the issue with a tiny test model: adding a += step to forward() makes it non-quantizable.

(BTW, I am aware of torch.nn.quantized.FloatFunctional – my use case prevents such intrusive model modifications)

At this point, eager mode quantization might require changes to the model in order to make it work. Here is an example of how resnet50 is quantized in pytorch -

Going forward we are planning on graph mode quantization where such invasive model changes won’t be required.

Thanks. I wanted to make sure I wasn’t missing anything obvious. The pre-quantized model works because of the changes including

def __init__(self, *args, **kwargs):
    **self.add_relu = torch.nn.quantized.FloatFunctional()**
    def forward(self, x):
        identity = x
        out = self.conv1(x)
        **out = self.add_relu.add_relu(out, identity)**

        return out

Hi,I keep getting this error :

RuntimeError: Could not run ‘quantized::conv2d’ with arguments from the ‘CPUTensorId’ backend. ‘quantized::conv2d’ is only available for these backends: [QuantizedCPUTensorId].

Can someone help me with this?

Any update on graph mode quantization? Facing similar issues quantizing ResNet.

hi @Bryan_Wang, we cannot commit to a timeline yet but we are hoping to release it as a prototype this year. You are welcome to check out the test cases demonstrating the current API in, although it will be in flux for the near future and we don’t have documentation just yet.