Quantization using one-at-a-time analysis

fnak · June 23, 2022, 7:21am

I am using eager mode quantization. However, I want to skip some layers from being quantized.
I am following the tutorial here Practical Quantization in PyTorch | PyTorch

To skip some layers, I wrote the following code:

for layer, _ in fusedModel.named_modules():
  if (layer in sortedSensitivityDict):
      _.qconfig = None
      print("skipping quant for", layer)

However, when I test the model now I get the following error:

Could not run ‘aten::_slow_conv2d_forward’ with arguments from the ‘QuantizedCPU’ backend.

If I understand correctly, this is because the layers with qconfig = none are receiving quantized data while expecting dequantized data.
Is there a way I can add instruction to dequantize data before the layer and quantize it after the layer, in my loop? or what other possible workaround might I do for this purpose?

Zafar · June 27, 2022, 8:07pm

You would need to wrap the layer that does not need quantization in a dequant->layer->quant