Generic static quantization

Hi,

I’m trying to quantise pre-trained models from torchvision, and I’ve hit an obstacle I 'm struggling to get past. I’ve been able to fuse layers and replace relus as needed, I’ve then used Quantwrapper to get the quant and dequant around the forward function and then I can prepare and convert using the quantisation tools.

The problem I’m at now is that I get an error every time I try to run a model with an operation that needs to be quantised. For example inceptionV3 uses a some operations during the forward pass and errors will be like:

'RuntimeError: Didn’t find kernel to dispatch for operator ‘aten::mul’. Tried to look up kernel for dispatch key ‘QuantizedCPUTensorID.’

I have bee able to run these networks by editing the model to replace operations with nn.quantised.FloatFunctional() operations, is there any way I can get past this without editing each network individually?

Hi @Joe_Heyward,

Currently, replacing these non-parametric operations with FloatFunctional is the only way. More details are here: https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html#model-architecture

The reason behind the need for replacement is that to quantize these operations we need to observe input/output values.

We are also working on automating this with quantization after scripting (graph mode quantization), but this feature is still in progress.

Ok, thanks! What’s the best way to keep up to date on this?

Ps I thought I would be able to get past this if I could get the quant and dequant tight enough around the network forward pass, so I wrote a function similar to: https://pytorch.org/docs/stable/_modules/torch/quantization/quantize.html#add_quant_dequant, where a quantwrapper is added around every submodule to a certain level but this gives me a separate error:

RuntimeError: size mistmatch, m1: [20160 x 224] , m2: [1024 x 100] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:197

Keeping an eye on release notes here (https://github.com/pytorch/pytorch/releases) is one way to keep up to date on this. Another more involved way is checking quantization related diffs.

It’s hard to say what’s wrong here without looking at the code.