Hello! I am trying to quantize a pretrained resnet50 model, but I am running into the error.
NotImplementedError: Could not run 'aten::quantize_per_tensor' with arguments from the 'QuantizedCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::quantize_per_tensor' is only available for these backends: [CPU, CUDA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].
I can’t figure out why this is going wrong. If someone could help me out that would be amazing. Here is my code below.
import torch
import numpy as np
import time
class QuantizedModel(torch.nn.Module):
def __init__(self, model):
super().__init__()
self.model_fp32 = model
self.quant = torch.quantization.QuantStub()
self.dequant = torch.quantization.DeQuantStub()
def forward(self, x):
x = self.quant(x)
x = self.model_fp32(x)
x = self.dequant(x)
return x
model = torch.hub.load('pytorch/vision', 'resnet50', pretrained=True).to('cuda')
quant_model = QuantizedModel(model)
quant_model.eval()
quant_model.qconfig = torch.ao.quantization.default_qconfig
print(quant_model.qconfig)
quant_model = torch.ao.quantization.prepare(quant_model, inplace=False)
quant_model = torch.ao.quantization.convert(quant_model, inplace=False)
input_tensor = preprocess(img).unsqueeze(0)
input_tensor = torch.quantize_per_tensor(input_tensor, scale=1.0, zero_point=0, dtype=torch.quint8)
output_batch_tensor = quant_model(input_tensor)
I saw a thread very similar to this one, but I was not able to find the answer to my problem from looking at that thread.