Could not run 'aten::quantize_per_tensor'

Hello! I am trying to quantize a pretrained resnet50 model, but I am running into the error.

NotImplementedError: Could not run 'aten::quantize_per_tensor' with arguments from the 'QuantizedCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::quantize_per_tensor' is only available for these backends: [CPU, CUDA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

I can’t figure out why this is going wrong. If someone could help me out that would be amazing. Here is my code below.

import torch
import numpy as np
import time


class QuantizedModel(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model_fp32 = model
        self.quant = torch.quantization.QuantStub()
        self.dequant = torch.quantization.DeQuantStub()
        
    def forward(self, x):
        x = self.quant(x)
        x = self.model_fp32(x)
        x = self.dequant(x)
        return x

model = torch.hub.load('pytorch/vision', 'resnet50', pretrained=True).to('cuda')
quant_model = QuantizedModel(model)
quant_model.eval()

quant_model.qconfig = torch.ao.quantization.default_qconfig
print(quant_model.qconfig)
quant_model = torch.ao.quantization.prepare(quant_model, inplace=False)
quant_model = torch.ao.quantization.convert(quant_model, inplace=False)

input_tensor = preprocess(img).unsqueeze(0)
input_tensor = torch.quantize_per_tensor(input_tensor, scale=1.0, zero_point=0, dtype=torch.quint8)

output_batch_tensor = quant_model(input_tensor)

I saw a thread very similar to this one, but I was not able to find the answer to my problem from looking at that thread.

Hi @aminooka,

Can you share the full stack trace + print your quantize model. Like this thread: Could not run 'aten::quantize_per_tensor' with arguments from the 'QuantizedCPU' backend - #3 by sarramrg

I think you don’t need to call torch.quantize_per_tensor for eager mode static quant. It may work if you remove that line.

you’re passing in quantized tensor, the QuantStub’s job is to take an fp32 tensor and convert it. Its not expecting to receive a quantized tensor. Beyond that, there are other parts of Quantization — PyTorch 2.0 documentation (the PTSQ API Example) that don’t look like you’re doing correctly.