RuntimeError: Could not run 'aten::thnn_conv2d_forward' with arguments from the 'QuantizedCPU' backend

I am trying to quantize my pre_trained model.


class Model_tobeQ(torch.nn.Module):
    def __init__(self, model):
        super(Model_tobeQ, self).__init__()
        self.quant = torch.quantization.QuantStub()
        self.model = model
        self.dequant = torch.quantization.DeQuantStub()

    def forward(self, x):
        x = self.quant(x)
        x = self.model(x)
        x = self.dequant(x)
        return x


    model_fp32 = Model_tobeQ(model)
    model_fp32.to('cpu')
    model_fp32.eval()

    model_fp32.qconfig = torch.quantization.get_default_qconfig('fbgemm')
    model = torch.quantization.prepare(model_fp32)
   
    eval_measures = online_eval(model, dataloader_eval, gpu, ngpus_per_node)
    myModel_int8 = torch.quantization.convert(model)
    print(myModel_int8 )
    fake_input = torch.rand((1,3,512,512))
    myModel_eest = myModel_int8(fake_input)
    print(myModel_eest)

The quantization seems working. print the quantized model I can see:

(0): QuantizedConv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), scale=46.11655044555664, zero_point=59, padding=(1, 1))
        (1): Interpolate()
        (2): QuantizedConv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), scale=82.63282775878906, zero_point=63, padding=(1, 1))
        (3): ReLU()
        (4): QuantizedConv2d(32, 1, kernel_size=(1, 1), stride=(1, 1), scale=16.16731071472168, zero_point=10)
        (5): ReLU(inplace=True)
.....

But when do inference I got

File "efficient_seg_train_main_quantized.py", line 538, in main_worker
    myModel_eest = myModel_int8(fake_input)
  File "/home/big_tree/miniconda3/envs/big_tree/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/big_tree/project/big_tree/seg/train/Network_train_super_loss_QAT/pytorch_from_begin/Network/Network_net_custom.py", line 134, in forward
    layer_1 = self.pretrained.layer1(x)
  File "/home/big_tree/miniconda3/envs/big_tree/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/big_tree/miniconda3/envs/big_tree/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/home/big_tree/miniconda3/envs/big_tree/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/big_tree/miniconda3/envs/big_tree/lib/python3.7/site-packages/torch/nn/quantized/modules/batchnorm.py", line 17, in forward
    self.running_var, self.eps, self.scale, self.zero_point)
RuntimeError: Could not run 'aten::thnn_conv2d_forward' with arguments from the 'QuantizedCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::thnn_conv2d_forward' is only available for these backends: [CPU, CUDA, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

QuantizedCPU: registered at /opt/conda/conda-bld/pytorch_1616554800319/work/aten/src/ATen/native/quantized/cpu/qbatch_norm.cpp:385 [kernel]
BackendSelect: fallthrough registered at /opt/conda/conda-bld/pytorch_1616554800319/work/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /opt/conda/conda-bld/pytorch_1616554800319/work/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: fallthrough registered at /opt/conda/conda-bld/pytorch_1616554800319/work/aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at /opt/conda/conda-bld/pytorch_1616554800319/work/aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at /opt/conda/conda-bld/pytorch_1616554800319/work/aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradXLA: fallthrough registered at /opt/conda/conda-bld/pytorch_1616554800319/work/aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
Tracer: fallthrough registered at /opt/conda/conda-bld/pytorch_1616554800319/work/torch/csrc/jit/frontend/tracer.cpp:999 [backend fallback]
Autocast: fallthrough registered at /opt/conda/conda-bld/pytorch_1616554800319/work/aten/src/ATen/autocast_mode.cpp:250 [backend fallback]
Batched: registered at /opt/conda/conda-bld/pytorch_1616554800319/work/aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at /opt/conda/conda-bld/pytorch_1616554800319/work/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
  1. From the error message I can see that the error happens at the first beging of the input of the self.model = model

  2. Also I notice that the line myModel_int8 = torch.quantization.convert(model) gives this warning:

/home/big_tree/miniconda3/envs/bigtree/lib/python3.7/site-packages/torch/quantization/observer.py:123: UserWarning: Please use quant_min and quant_max to specify the range for observers.                     reduce_range will be deprecated in a future release of PyTorch.
  reduce_range will be deprecated in a future release of PyTorch."

Please help.

@Vasiliy_Kuznetsov Any suggestions will be appreciated.