Model Quantization Pytorch and use in android

I have pytorch model saved as .pth file.
I wanted to use quantized model in my android project.
How can I achieve quantized .pt model ?
I tried some of the ways but I am not able to achieve it.
However I am done with just converting to .pt model and using it in android.

Hello @mohit7,

Did you try to follow our quantization tutorial https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html ?

If yes, which of the steps does not work or need more details/comments?

Hi @IvanKobzarev yes I followed the steps given but it throws me error.
My model is Resnet-34 base model on top of that I have some more functionality.
I am on Windows machine.

Error

RuntimeError: Didn't find engine for operation quantized::conv_prepack NoQEngine (operator () at ..\aten\src\ATen\native\quantized\cpu\qconv_prepack.cpp:264)
(no backtrace available)

This looks like the same issue as https://github.com/pytorch/pytorch/issues/29327 . In short, it looks like quantization is not currently supported on Windows.

Hey @David_Reiss so I moved from Windows to Linux for this quantization.
As I know BatchNormalization support is still not there so we have to fuse the BatchNormalization layer.
I tried to fuse the Batchnormalization layer with Convolution layer but it is not happening nor throwing any error.
After I am doing jit trace it is throwing same error which means BatchNormalization need to be fused

@mohit7, can you post the error you’re getting in your latest version?

@supriyar or @raghuramank100, do you know how to ensure that BN gets folded during quantization?

@mohit7, it might be useful to take a look at the quantized models uploaded in torchvision.
Here is a link to the resnet model - https://github.com/pytorch/vision/blob/master/torchvision/models/quantization/resnet.py
I think if you follow the same flow for your model by re-implementing def fuse_model(self):, it should work.

@David_Reiss I error I got is

RuntimeError: No function is registered for schema aten::native_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float momentum, float eps) -> (Tensor, Tensor, Tensor) on tensor type QuantizedCPUTensorId; available functions are CPUTensorId, MkldnnCPUTensorId, VariableTensorId

I have followed all the steps correctly but batch normalization is not fusing

@supriyar Can you help?

@mohit7 did you modify the fuse_model function to work with your model (https://github.com/pytorch/vision/blob/master/torchvision/models/quantization/resnet.py#L45)?

Currently fusion is only supported for conv + bn or conv + bn + relu. Does your model have a use case other than that?

@mohit7: Can you share the code for your model? Batch norm fusion is supported , but you need to call it explicitly prior to calling prepare/convert to quantize your model.

Thanks,

I followed the tutorial on a custom model it did make the model much smaller in size, however, the inference time increased heavily when testing it on android. Any clues as to why this is? Also, on top of that FloatFunctional().add(x,y) really slows down inference a lot.

Hi @gigadeplex, the FloatFunctional().Add() is increasing time on inference because this function first convert the ongoing quantized tensor to float tensor, then addition and then converting back to quantized tensor.