Model Quantization Pytorch and use in android

I have pytorch model saved as .pth file.
I wanted to use quantized model in my android project.
How can I achieve quantized .pt model ?
I tried some of the ways but I am not able to achieve it.
However I am done with just converting to .pt model and using it in android.

Hello @mohit7,

Did you try to follow our quantization tutorial https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html ?

If yes, which of the steps does not work or need more details/comments?

Hi @IvanKobzarev yes I followed the steps given but it throws me error.
My model is Resnet-34 base model on top of that I have some more functionality.
I am on Windows machine.

Error

RuntimeError: Didn't find engine for operation quantized::conv_prepack NoQEngine (operator () at ..\aten\src\ATen\native\quantized\cpu\qconv_prepack.cpp:264)
(no backtrace available)

This looks like the same issue as https://github.com/pytorch/pytorch/issues/29327 . In short, it looks like quantization is not currently supported on Windows.

Hey @David_Reiss so I moved from Windows to Linux for this quantization.
As I know BatchNormalization support is still not there so we have to fuse the BatchNormalization layer.
I tried to fuse the Batchnormalization layer with Convolution layer but it is not happening nor throwing any error.
After I am doing jit trace it is throwing same error which means BatchNormalization need to be fused

@mohit7, can you post the error you’re getting in your latest version?

@supriyar or @raghuramank100, do you know how to ensure that BN gets folded during quantization?

@mohit7, it might be useful to take a look at the quantized models uploaded in torchvision.
Here is a link to the resnet model - https://github.com/pytorch/vision/blob/master/torchvision/models/quantization/resnet.py
I think if you follow the same flow for your model by re-implementing def fuse_model(self):, it should work.

@David_Reiss I error I got is

RuntimeError: No function is registered for schema aten::native_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float momentum, float eps) -> (Tensor, Tensor, Tensor) on tensor type QuantizedCPUTensorId; available functions are CPUTensorId, MkldnnCPUTensorId, VariableTensorId

I have followed all the steps correctly but batch normalization is not fusing

@supriyar Can you help?

@mohit7 did you modify the fuse_model function to work with your model (https://github.com/pytorch/vision/blob/master/torchvision/models/quantization/resnet.py#L45)?

Currently fusion is only supported for conv + bn or conv + bn + relu. Does your model have a use case other than that?

@mohit7: Can you share the code for your model? Batch norm fusion is supported , but you need to call it explicitly prior to calling prepare/convert to quantize your model.

Thanks,