Error in running quantised model RuntimeError: Could not run 'quantized::conv2d.new' with arguments from the 'CPU' backend

IgorKasianenko · May 15, 2022, 11:05am

Hello
I am trying to run model quantization like in official tutorial Quantization Recipe — PyTorch Tutorials 1.11.0+cu102 documentation
I had no problem with training and saving model, but running it with jit in C++ and in Python throws same error about not implemented error. Any ideas?

import torch
import torchvision
model = torchvision.models.mobilenet_v2()
backend = "fbgemm"
model.qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
model_static_quantized = torch.quantization.prepare(model, inplace=False)
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)
torch.jit.save(torch.jit.script(model_static_quantized), '/data/Igor/projects/torch-cpp/tutorial.pt')

module = torch.jit.load('/data/Igor/projects/torch-cpp/tutorial.pt')

inputs = torch.ones((1,3,224,224))

module(inputs)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-37-21f6333d0b06> in <module>
     13 inputs = torch.ones((1,3,224,224))
     14 
---> 15 module(inputs)

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/torchvision/models/mobilenetv2.py", line 11, in forward
  def forward(self: __torch__.torchvision.models.mobilenetv2.MobileNetV2,
    x: Tensor) -> Tensor:
    return (self)._forward_impl(x, )
            ~~~~~~~~~~~~~~~~~~~ <--- HERE
  def _forward_impl(self: __torch__.torchvision.models.mobilenetv2.MobileNetV2,
    x: Tensor) -> Tensor:
  File "code/__torch__/torchvision/models/mobilenetv2.py", line 15, in _forward_impl
    x: Tensor) -> Tensor:
    _0 = __torch__.torch.nn.functional.adaptive_avg_pool2d
    x0 = (self.features).forward(x, )
          ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    x1 = torch.reshape(_0(x0, [1, 1], ), [(torch.size(x0))[0], -1])
    return (self.classifier).forward(x1, )
  File "code/__torch__/torch/nn/modules/container/___torch_mangle_255.py", line 46, in forward
    _17 = getattr(self, "17")
    _18 = getattr(self, "18")
    input0 = (_0).forward(input, )
              ~~~~~~~~~~~ <--- HERE
    input1 = (_1).forward(input0, )
    input2 = (_2).forward(input1, )
  File "code/__torch__/torchvision/models/mobilenetv2/___torch_mangle_217.py", line 15, in forward
    _1 = getattr(self, "1")
    _2 = getattr(self, "2")
    input0 = (_0).forward(input, )
              ~~~~~~~~~~~ <--- HERE
    input1 = (_1).forward(input0, )
    return (_2).forward(input1, )
  File "code/__torch__/torch/nn/quantized/modules/conv.py", line 36, in forward
    else:
      input0 = input
    _6 = ops.quantized.conv2d(input0, self._packed_params, self.scale, self.zero_point)
         ~~~~~~~~~~~~~~~~~~~~ <--- HERE
    return _6
  def __getstate__(self: __torch__.torch.nn.quantized.modules.conv.Conv2d) -> Tuple[int, int, Tuple[int, int], Tuple[int, int], Tuple[int, int], Tuple[int, int], bool, Tuple[int, int], int, str, Tensor, Optional[Tensor], float, int, bool]:

Traceback of TorchScript, original code (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torchvision/models/mobilenetv2.py", line 198, in forward
    def forward(self, x: Tensor) -> Tensor:
        return self._forward_impl(x)
               ~~~~~~~~~~~~~~~~~~ <--- HERE
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 118, in forward
    def forward(self, input):
        for module in self:
            input = module(input)
                    ~~~~~~ <--- HERE
        return input
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 118, in forward
    def forward(self, input):
        for module in self:
            input = module(input)
                    ~~~~~~ <--- HERE
        return input
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/quantized/modules/conv.py", line 407, in forward
            input = F.pad(input, _reversed_padding_repeated_twice,
                          mode=self.padding_mode)
        return ops.quantized.conv2d(
               ~~~~~~~~~~~~~~~~~~~~ <--- HERE
            input, self._packed_params, self.scale, self.zero_point)
RuntimeError: Could not run 'quantized::conv2d.new' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::conv2d.new' is only available for these backends: [QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, Tracer, Autocast, Batched, VmapMode].

QuantizedCPU: registered at ../aten/src/ATen/native/quantized/cpu/qconv.cpp:873 [kernel]
BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradXLA: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
Tracer: fallthrough registered at ../torch/csrc/jit/frontend/tracer.cpp:1019 [backend fallback]
Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:250 [backend fallback]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

tom · May 15, 2022, 5:50pm

It means that you cannot pass fp32 to your quantized model but would have to quantize the input.
If you add QuantStub called at the beginning and DeQuantStub for the end, you can pass fp32.

Best regards

Thomas

IgorKasianenko · May 16, 2022, 10:54am

Thank you for you reply!
I had to follow blogpost to see how to fuse model and now it works.

import torch
import torchvision

class QuantizedModel(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model_fp32 = model
        self.quant = torch.quantization.QuantStub()
        self.dequant = torch.quantization.DeQuantStub()
        
    def forward(self, x):
        x = self.quant(x)
        x = self.model_fp32(x)
        x = self.dequant(x)
        return x
        
def fuse_resnet18(model):
    torch.quantization.fuse_modules(model, [["conv1", "bn1", "relu"]], inplace=True)
    for module_name, module in model.named_children():
        if "layer" in module_name:
            for basic_block_name, basic_block in module.named_children():
                torch.quantization.fuse_modules(basic_block, [["conv1", "bn1", "relu"], ["conv2", "bn2"]], inplace=True)
                for sub_block_name, sub_block in basic_block.named_children():
                    if sub_block_name == "downsample":
                        torch.quantization.fuse_modules(sub_block, [["0", "1"]], inplace=True)    

model = torchvision.models.resnet18()
fuse_resnet18(model)
quantized_model = QuantizedModel(model)

backend = "fbgemm"
model.qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
model_static_quantized = torch.quantization.prepare(quantized_model, inplace=False)
model_static_quantized = torch.quantization.convert(model, inplace=False)
torch.jit.save(torch.jit.script(model_static_quantized), '/data/Igor/projects/torch-cpp/tutorial.pt')

module = torch.jit.load('/data/Igor/projects/torch-cpp/tutorial.pt')

inputs = torch.ones((1,3,224,224))

module(inputs)

Y_Simson · April 24, 2023, 9:42am

Surely you meant:
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)

?

IgorKasianenko · April 24, 2023, 11:09am

Most likely yes Thanks that 11 month later I got the reply. Don’t think that this is relevant for me as of now, but good to know that it works

WangFengtu1996 · August 29, 2024, 3:43am

When I want to inference QAT saved model. I have same question.
Load model

 self._create_model()
        if quantized:

            self.model.qconfig = torch.ao.quantization.get_default_qconfig('x86')
            # fix Missing key(s) in state_dict
            self.model.fuse_model(is_qat=True)
            # self.model.train()
            torch.ao.quantization.prepare_qat(self.model, inplace=True)
            torch.ao.quantization.convert(self.model, inplace=True)
            #TODO  load dynamic name
            from pprint import pprint
            model_dict = torch.load('/home/wangft/workspace/weigh/train_model/output/run_289/TCN4_50_QAT_fp32_opset_12_2024-08-28-17-49-47_qat.pth', map_location=self.device)
            # fix Unexpected key(s) in state_dict: "quant.scale", "quant.zero_point".
            del model_dict['quant.scale']
            del model_dict['quant.zero_point']
            pprint(torch.load('/home/wangft/workspace/weigh/train_model/output/run_289/TCN4_50_QAT_fp32_opset_12_2024-08-28-17-49-47_qat.pth', map_location=self.device).keys())
            # exit()
            pprint(self.model)
            # exit()
            self.model.load_state_dict(model_dict)
            #BUG  NotImplementedError: Could not run 'quantized::linear' with arguments from the 'CPU' backend.
            # exit()

when I inference model, I got error NotImplementedError: Could not run 'quantized::linear' with arguments from the 'CPU' backend. . I can not understand why torch use CPU backend.
My model print is that.

TCN4_50_QAT(
  (linear1): QuantizedLinear(in_features=2, out_features=250, scale=1.0, zero_point=0, qscheme=torch.per_channel_affine)
  (net): TCNN_Block2(
    (network): Sequential(
      (0): ResBlock2(
        (TCM_net): Sequential(
          (0): ConvBNReLU(
            (0): QuantizedConvReLU1d(50, 50, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
            (1): Identity()
            (2): Identity()
          )
          (1): DepthwiseSeparableConv2(
            (net): Sequential(
              (0): QuantizedConv1d(50, 50, kernel_size=(3,), stride=(1,), scale=1.0, zero_point=0, padding=(2,), groups=50)
              (1): Chomp1d()
              (2): ConvBNReLU(
                (0): QuantizedConvReLU1d(50, 50, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
                (1): Identity()
                (2): Identity()
              )
            )
          )
        )
        (skip_add): QFunctional(
          scale=1.0, zero_point=0
          (activation_post_process): Identity()
        )
      )
      (1): ResBlock2(
        (TCM_net): Sequential(
          (0): ConvBNReLU(
            (0): QuantizedConvReLU1d(50, 50, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
            (1): Identity()
            (2): Identity()
          )
          (1): DepthwiseSeparableConv2(
            (net): Sequential(
              (0): QuantizedConv1d(50, 50, kernel_size=(3,), stride=(1,), scale=1.0, zero_point=0, padding=(4,), dilation=(2,), groups=50)
              (1): Chomp1d()
              (2): ConvBNReLU(
                (0): QuantizedConvReLU1d(50, 50, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
                (1): Identity()
                (2): Identity()
              )
            )
          )
        )
        (skip_add): QFunctional(
          scale=1.0, zero_point=0
          (activation_post_process): Identity()
        )
      )
      (2): ResBlock2(
        (TCM_net): Sequential(
          (0): ConvBNReLU(
            (0): QuantizedConvReLU1d(50, 50, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
            (1): Identity()
            (2): Identity()
          )
          (1): DepthwiseSeparableConv2(
            (net): Sequential(
              (0): QuantizedConv1d(50, 50, kernel_size=(3,), stride=(1,), scale=1.0, zero_point=0, padding=(8,), dilation=(4,), groups=50)
              (1): Chomp1d()
              (2): ConvBNReLU(
                (0): QuantizedConvReLU1d(50, 50, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
                (1): Identity()
                (2): Identity()
              )
            )
          )
        )
        (skip_add): QFunctional(
          scale=1.0, zero_point=0
          (activation_post_process): Identity()
        )
      )
      (3): ResBlock2(
        (TCM_net): Sequential(
          (0): ConvBNReLU(
            (0): QuantizedConvReLU1d(50, 50, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
            (1): Identity()
            (2): Identity()
          )
          (1): DepthwiseSeparableConv2(
            (net): Sequential(
              (0): QuantizedConv1d(50, 50, kernel_size=(3,), stride=(1,), scale=1.0, zero_point=0, padding=(16,), dilation=(8,), groups=50)
              (1): Chomp1d()
              (2): ConvBNReLU(
                (0): QuantizedConvReLU1d(50, 50, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
                (1): Identity()
                (2): Identity()
              )
            )
          )
        )
        (skip_add): QFunctional(
          scale=1.0, zero_point=0
          (activation_post_process): Identity()
        )
      )
      (4): ResBlock2(
        (TCM_net): Sequential(
          (0): ConvBNReLU(
            (0): QuantizedConvReLU1d(50, 50, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
            (1): Identity()
            (2): Identity()
          )
          (1): DepthwiseSeparableConv2(
            (net): Sequential(
              (0): QuantizedConv1d(50, 50, kernel_size=(3,), stride=(1,), scale=1.0, zero_point=0, padding=(32,), dilation=(16,), groups=50)
              (1): Chomp1d()
              (2): ConvBNReLU(
                (0): QuantizedConvReLU1d(50, 50, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
                (1): Identity()
                (2): Identity()
              )
            )
          )
        )
        (skip_add): QFunctional(
          scale=1.0, zero_point=0
          (activation_post_process): Identity()
        )
      )
      (5): ResBlock2(
        (TCM_net): Sequential(
          (0): ConvBNReLU(
            (0): QuantizedConvReLU1d(50, 50, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
            (1): Identity()
            (2): Identity()
          )
          (1): DepthwiseSeparableConv2(
            (net): Sequential(
              (0): QuantizedConv1d(50, 50, kernel_size=(3,), stride=(1,), scale=1.0, zero_point=0, padding=(64,), dilation=(32,), groups=50)
              (1): Chomp1d()
              (2): ConvBNReLU(
                (0): QuantizedConvReLU1d(50, 50, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
                (1): Identity()
                (2): Identity()
              )
            )
          )
        )
        (skip_add): QFunctional(
          scale=1.0, zero_point=0
          (activation_post_process): Identity()
        )
      )
      (6): ResBlock2(
        (TCM_net): Sequential(
          (0): ConvBNReLU(
            (0): QuantizedConvReLU1d(50, 50, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
            (1): Identity()
            (2): Identity()
          )
          (1): DepthwiseSeparableConv2(
            (net): Sequential(
              (0): QuantizedConv1d(50, 50, kernel_size=(3,), stride=(1,), scale=1.0, zero_point=0, padding=(128,), dilation=(64,), groups=50)
              (1): Chomp1d()
              (2): ConvBNReLU(
                (0): QuantizedConvReLU1d(50, 50, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
                (1): Identity()
                (2): Identity()
              )
            )
          )
        )
        (skip_add): QFunctional(
          scale=1.0, zero_point=0
          (activation_post_process): Identity()
        )
      )
    )
  )
  (downsample): QuantizedConv1d(50, 25, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)
  (downsample2): QuantizedConv1d(25, 1, kernel_size=(1,), stride=(1,), scale=1.0, zero_point=0)

WangFengtu1996 · August 29, 2024, 3:53am

The doc solve my question.
Quantization — PyTorch 2.4 documentation