How to Static Quantization(PTQ) by PyTorch 1.7.1

crook52 · February 12, 2021, 6:18am

Hi!
I followed tutorials/quantization and tried to PTQ MobileNetV2 from torchvision.
However, when I tried to predict with the quantized model, I got the following error and could not run it. How can I solve this problem?

By the way, do I need to insert QuantStub() and DeQuantStub() in the foward when I do a PTQ?
I’m confused because there are so many ways to do this.
What is the correct way to do a PTQ in Pytorch 1.7.1?
Quantization — PyTorch 1.7.1 documentation
torch.quantization — PyTorch 1.7.1 documentation
Quantization Recipe — PyTorch Tutorials 1.7.1 documentation

Error

Traceback (most recent call last):
  File "ptq_imagenet_pth.py", line 137, in <module>
    res = model_static_quantized(x.clone().detach().to(device, dtype=torch.float))

...
...

RuntimeError: Could not run 'quantized::conv2d.new' with arguments from the 'CUDA' backend. 'quantized::conv2d.new' is only available for these backends: [QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, Tracer, Autocast, Batched, VmapMode].

QuantizedCPU: registered at /pytorch/aten/src/ATen/native/quantized/cpu/qconv.cpp:858 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradXLA: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
Tracer: fallthrough registered at /pytorch/torch/csrc/jit/frontend/tracer.cpp:967 [backend fallback]
Autocast: fallthrough registered at /pytorch/aten/src/ATen/autocast_mode.cpp:254 [backend fallback]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

PTQ script

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = torchvision.models.mobilenet_v2(pretrained=True)
model.eval()
backend = "fbgemm"
model.qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
model_static_quantized = torch.quantization.prepare(model, inplace=False)
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)
model_static_quantized = model_static_quantized.to(device)
#x is input tensor whose shape is (100, 3, 224, 224) 
res = model_static_quantized(x.clone().detach().to(device, dtype=torch.float))

Environment

Ubuntu: 18.0
CUDA: 11.0
Python: 3.6.10
PyTorch: 1.7.1
torchvision: 0.8.2

Thank you!

Vasiliy_Kuznetsov · February 12, 2021, 8:04pm

Quantized inference is not supported on CUDA at the moment. You can move the model to CPU and it should work.

crook52 · February 15, 2021, 8:58am

Thank you for your response!

When I used CPU, I got the bellow error which looks like using CUDA.

untimeError: Could not run 'quantized::conv2d.new' with arguments from the 'CPU' backend. 'quantized::conv2d.new' is only available for these backends: [QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, Tracer, Autocast, Batched, VmapMode].

taha_entesari · February 16, 2021, 3:11pm

Same type of error when trying this on a yolov3 implementation. What’s the main cause?

supriyar · February 17, 2021, 5:13am

The example listed in Quantization — PyTorch 1.7.1 documentation shows how to use quant/dequant stubs to statically quantize the model using eager mode.

The error you are seeing is probably because you are missing a quant stub operator before the conv operator.

crook52 · February 18, 2021, 6:13am

Thanks for your reply!

I understood that I have to insert quant/dequant stub.
But I don’t know how to insert these in the model whose architecture is unknown, as in pretrained model.
Do I have to get mdoel’s architecture and I write model class with quant/dequan stub added??
If I use torch.quantization.QuantWrapper(module ), will be this problem solved?

Thank you!

supriyar · February 20, 2021, 10:45pm

Do I have to get mdoel’s architecture and I write model class with quant/dequan stub added??

If you wish to skip quantizing certain layers then yes, this is the recommended way with eager model quantization

If I use torch.quantization.QuantWrapper(module ) , will be this problem solved?

It will add a quant/dequant around the entire model and not for individual modules.

crook52 · February 22, 2021, 2:42am

Thank you very much for your support, @supriyar .

I want to quantize entire model, so I used QuantWrapper().
However I couldn’t quantize well and got bellow error.

RuntimeError: Could not run 'aten::add.Tensor' with arguments from the 'QuantizedCPU' backend. 'aten::add.Tensor' is only available for these backends: [CPU, CUDA, MkldnnCPU, SparseCPU, SparseCUDA, Meta, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

And I tried to insert quant/dequant stub to model by myself.

class QuantizedMobileNetV2(nn.Module):
    def __init__(self, model_fp32):
        super(QuantizedMobileNetV2, self).__init__()
        self.quant = torch.quantization.QuantStub()
        self.dequant = torch.quantization.DeQuantStub()
        self.model_fp32 = model_fp32

    def forward(self, x):
        x = self.quant(x)
        x = self.model_fp32(x)
        x = self.dequant(x)
        return x

But I got same error when I used QuantWrapper().

crook52 · March 12, 2021, 8:30am

It was solved in this topic.

Thanks all!!!