Question on skipping quantization on unsupported modules

Let’s say I have a module block, where part of it is not currently supported for quantization, so I added Quant, Dequant stubs shown below.

> class ExampleBlock(nn.Module): 
        def __init__(self):
              self.quant() = QuantStub()
              self.dequant() = DeQuantStub()

        def forward(self, x):
              x = Supported(x)
              x = dequant(x)
              x = NotSupported(x)
              x = quant(x)
        return x 

I have two questions. First, I am using this module as a building block for my network, meaning it is repetitively being called. I learned on this forum that QuantStubs need to be called each time it is needed, does it mean I have unroll the entire network where this module is being used?

Secondly, do I have to specify in the m.ExampleBlock.qconfig = None, and all the cases where it has been called to skip the quantization on NotSupported layers and functions?

Let me know if I explained my questions well enough.

Best,
Hua

Could you please clarify what you mean by “unrolling the entire network” here?

Secondly, do I have to specify in the m.ExampleBlock.qconfig = None, and all the cases where it has been called to skip the quantization on NotSupported layers and functions?

I don’t think this is necessary (I think you’ll get an exception at runtime if you try to quantize something that’s not quantizable) if I understand your question correctly. Are you observing errors without this explicit specification.

Hi David:
Thanks for your reply, what i meant is the following. If I call this module which has unsupported layers for quantization as below, do I have to write ExampleBlock1(), ExampleBlock2(), etc or I could just use the single ExampleBlock() and the QuantStub and DeQuantStub will perform accordingly? I am asking, because I am quantizing a model right now, some of the layers are not supported so I used dequant() quant() to surpass them, but the result is very poor.

class model()
         def __init__():
              self.layer1 = ExampleBlock()
              self.layer2 = ExampleBlock()

Hi Hua. I don’t think you need to wrap unsupported layers with a dequant and quant stubs. I’ll confirm with the team. Did you have issues when you didn’t use the stubs?

Edit: Sorry, I think I was wrong here. You’ll get an error if you try to pass the quantized output of a supported layer as an argument for a non-quantized layer. Is that what you’re asking?

Hi Hua,

With Eager mode, inserting quant/dequant stubs works for selective quantization. Can you clarify what you mean by “the result is very poor”? There are a few different ways to diagnose “poor performance” when using quantized models (see PyTorch Numeric Suite Tutorial — PyTorch Tutorials 1.10.1+cu102 documentation).

Re: using separate instances of ExampleBlock I think it is necessary if you have different weights.

I find using FX mode easier for selective quantization. In your example, I’d use it like

# skip quantization on NotSupported and Linear modules
qconfig = {
        "": torch.quantization.get_default_qconfig("fbgemm"), # global config
        "object_type": [(torch.nn.NotSupported, None), (torch.nn.Linear, None)] }, 
        }

prep = quantize_fx.prepare_fx(ExampleBlock(), qconfig_dict)
# calibrate
QuantizedExampleBlock = quantize_fx.convert_fx(prep)

class model:
  def __init__():
    self.l1 = QuantizedExampleBlock()
    self.l2 = QuantizedExampleBlock()

This might be a helpful reference: Practical Quantization in PyTorch | PyTorch

Hi David:
Yes, that’s right. I added dequant and quant stubs before and after the unsupported layer to bypass quantization. I was able to quantize and save the model. But when I jit load the quantized model, i encountered the “Could not run on Quantized CPU” which is very confusing.
By the way, my block configuration is "Conv + ReLu + Batchnorm ", as in Fuse_modules more sequence support. Since this configuration is not supported for fusion, I fused “Conv + ReLu” and bypassed quantization for Batchnorm. Wonder if I did anything wrong here.

Best,
Hua

Hi Suraj:
Thanks for your reply, I will try the Numeric Suite out.
Re: using separate instances of ExampleBlock I think it is necessary if you have different weights. Could you please clarify more on this. Do I need to write ExampleBlock1, ExampleBlock2… etc since they have quant and dequant stubs inside or do I just need to write one ExampleBlock and called them like in your code?

Best
Hua

I think you can get away with initializing multiple instances of the same ExampleBlock, as long as that serves your purpose. Each instance will be identical. I don’t think you need to create separate classes for each, based on the example you’ve provided!