Hi!
I followed tutorials/quantization and tried to PTQ MobileNetV2 from torchvision.
However, when I tried to predict with the quantized model, I got the following error and could not run it. How can I solve this problem?
By the way, do I need to insert QuantStub() and DeQuantStub() in the foward when I do a PTQ?
I’m confused because there are so many ways to do this.
What is the correct way to do a PTQ in Pytorch 1.7.1?
Quantization — PyTorch 1.7.1 documentation
torch.quantization — PyTorch 1.7.1 documentation
Quantization Recipe — PyTorch Tutorials 1.7.1 documentation
Error
Traceback (most recent call last):
File "ptq_imagenet_pth.py", line 137, in <module>
res = model_static_quantized(x.clone().detach().to(device, dtype=torch.float))
...
...
RuntimeError: Could not run 'quantized::conv2d.new' with arguments from the 'CUDA' backend. 'quantized::conv2d.new' is only available for these backends: [QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, Tracer, Autocast, Batched, VmapMode].
QuantizedCPU: registered at /pytorch/aten/src/ATen/native/quantized/cpu/qconv.cpp:858 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradXLA: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
Tracer: fallthrough registered at /pytorch/torch/csrc/jit/frontend/tracer.cpp:967 [backend fallback]
Autocast: fallthrough registered at /pytorch/aten/src/ATen/autocast_mode.cpp:254 [backend fallback]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
PTQ script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = torchvision.models.mobilenet_v2(pretrained=True)
model.eval()
backend = "fbgemm"
model.qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
model_static_quantized = torch.quantization.prepare(model, inplace=False)
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)
model_static_quantized = model_static_quantized.to(device)
#x is input tensor whose shape is (100, 3, 224, 224)
res = model_static_quantized(x.clone().detach().to(device, dtype=torch.float))
Environment
- Ubuntu: 18.0
- CUDA: 11.0
- Python: 3.6.10
- PyTorch: 1.7.1
- torchvision: 0.8.2
Thank you!