RuntimeError: Could not run 'quantized::conv2d_relu.new' with arguments from the 'QuantizedCUDA' backend. 'quantized::conv2d_relu.new' is only available for these backends: [QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA,...]

Hello all, hope you are having a great day.
I quantized a model using Graph mode post-training static quantization and everything seems to have gone smoothly without a hitch.
However, upon loading the newly quantized model and trying to do a forward I get this error :

Evaluating data/angles.txt...
  0%|                                                                                                                                                             | 0/6000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/internet/shishosama/embeder_moder_training/graph_quantizer_static.py", line 119, in <module>
    lfw_test(jit_model)
  File "/mnt/internet/shishosama/embeder_moder_training/lfw_eval.py", line 350, in lfw_test
    evaluate(model)
  File "/mnt/internet/shishosama/embeder_moder_training/lfw_eval.py", line 111, in evaluate
    output = model(imgs)
  File "/root/anaconda3/envs/shishosama/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/models_new/___torch_mangle_1853.py", line 23, in forward
    input_2_quant = torch.quantize_per_tensor(input, 0.037445519119501114, 57, 13)
    _0 = getattr(self, "quantized._jit_pass_packed_weight_0")
    _1 = ops.quantized.conv2d_relu(input_2_quant, _0, 0.0094706285744905472, 0)
         ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _6_dequant = torch.dequantize(_1)
    input0 = torch.feature_dropout(_6_dequant, 0., False)

Traceback of TorchScript, original code (most recent call last):

graph(%a_quant, %packed_params, %r_scale, %r_zero_point, %r_dtype, %stride, %padding, %dilation, %groups):
        %r_quant = quantized::conv2d_relu(%a_quant, %packed_params, %r_scale, %r_zero_point)
                   ~~~~~~~~~ <--- HERE
        return (%r_quant) 
RuntimeError: Could not run 'quantized::conv2d_relu.new' with arguments from the 'QuantizedCUDA' backend. 'quantized::conv2d_relu.new' is only available for these backends: [QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, Tracer, Autocast, Batched, VmapMode].

QuantizedCPU: registered at /pytorch/aten/src/ATen/native/quantized/cpu/qconv.cpp:858 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradXLA: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
Tracer: fallthrough registered at /pytorch/torch/csrc/jit/frontend/tracer.cpp:967 [backend fallback]
Autocast: fallthrough registered at /pytorch/aten/src/ATen/autocast_mode.cpp:254 [backend fallback]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

What am I missing here? previously my dynamically quantized model didn’t have this issue (they were also quantized using graph mode)
so I’m not sure what’s happening here. I also get this exact error when I try to do a forward pass using the model quantized using the eager mode (here is its own thread)

In case the quantized model is of some use, here it is: https://gofile.io/d/zyDEaY
Any help is greatly appreciated.

Thanks to dear God I found the issue at last!
As the error is stating (now obviously!), my model was on cpu but my input was on cuda. setting the data to be on cpu as well fixed this issue.

Hi.
How did you set the data on the cpu?

use .cpu() on your data tensor