I am using eager mode quantization. However, I want to skip some layers from being quantized.
I am following the tutorial here Practical Quantization in PyTorch | PyTorch
To skip some layers, I wrote the following code:
for layer, _ in fusedModel.named_modules():
if (layer in sortedSensitivityDict):
_.qconfig = None
print("skipping quant for", layer)
However, when I test the model now I get the following error:
Could not run ‘aten::_slow_conv2d_forward’ with arguments from the ‘QuantizedCPU’ backend.
If I understand correctly, this is because the layers with qconfig = none are receiving quantized data while expecting dequantized data.
Is there a way I can add instruction to dequantize data before the layer and quantize it after the layer, in my loop? or what other possible workaround might I do for this purpose?