Let’s say I have a module block, where part of it is not currently supported for quantization, so I added Quant, Dequant stubs shown below.
> class ExampleBlock(nn.Module):
def __init__(self):
self.quant() = QuantStub()
self.dequant() = DeQuantStub()
def forward(self, x):
x = Supported(x)
x = dequant(x)
x = NotSupported(x)
x = quant(x)
return x
I have two questions. First, I am using this module as a building block for my network, meaning it is repetitively being called. I learned on this forum that QuantStubs need to be called each time it is needed, does it mean I have unroll the entire network where this module is being used?
Secondly, do I have to specify in the m.ExampleBlock.qconfig = None, and all the cases where it has been called to skip the quantization on NotSupported layers and functions?
Let me know if I explained my questions well enough.
Could you please clarify what you mean by “unrolling the entire network” here?
Secondly, do I have to specify in the m.ExampleBlock.qconfig = None, and all the cases where it has been called to skip the quantization on NotSupported layers and functions?
I don’t think this is necessary (I think you’ll get an exception at runtime if you try to quantize something that’s not quantizable) if I understand your question correctly. Are you observing errors without this explicit specification.
Hi David:
Thanks for your reply, what i meant is the following. If I call this module which has unsupported layers for quantization as below, do I have to write ExampleBlock1(), ExampleBlock2(), etc or I could just use the single ExampleBlock() and the QuantStub and DeQuantStub will perform accordingly? I am asking, because I am quantizing a model right now, some of the layers are not supported so I used dequant() quant() to surpass them, but the result is very poor.
class model()
def __init__():
self.layer1 = ExampleBlock()
self.layer2 = ExampleBlock()
Hi Hua. I don’t think you need to wrap unsupported layers with a dequant and quant stubs. I’ll confirm with the team. Did you have issues when you didn’t use the stubs?
Edit: Sorry, I think I was wrong here. You’ll get an error if you try to pass the quantized output of a supported layer as an argument for a non-quantized layer. Is that what you’re asking?
With Eager mode, inserting quant/dequant stubs works for selective quantization. Can you clarify what you mean by “the result is very poor”? There are a few different ways to diagnose “poor performance” when using quantized models (see PyTorch Numeric Suite Tutorial — PyTorch Tutorials 1.10.1+cu102 documentation).
Re: using separate instances of ExampleBlock I think it is necessary if you have different weights.
I find using FX mode easier for selective quantization. In your example, I’d use it like
# skip quantization on NotSupported and Linear modules
qconfig = {
"": torch.quantization.get_default_qconfig("fbgemm"), # global config
"object_type": [(torch.nn.NotSupported, None), (torch.nn.Linear, None)] },
}
prep = quantize_fx.prepare_fx(ExampleBlock(), qconfig_dict)
# calibrate
QuantizedExampleBlock = quantize_fx.convert_fx(prep)
class model:
def __init__():
self.l1 = QuantizedExampleBlock()
self.l2 = QuantizedExampleBlock()
Hi David:
Yes, that’s right. I added dequant and quant stubs before and after the unsupported layer to bypass quantization. I was able to quantize and save the model. But when I jit load the quantized model, i encountered the “Could not run on Quantized CPU” which is very confusing.
By the way, my block configuration is "Conv + ReLu + Batchnorm ", as in Fuse_modules more sequence support. Since this configuration is not supported for fusion, I fused “Conv + ReLu” and bypassed quantization for Batchnorm. Wonder if I did anything wrong here.
Hi Suraj:
Thanks for your reply, I will try the Numeric Suite out.
Re: using separate instances of ExampleBlock I think it is necessary if you have different weights. Could you please clarify more on this. Do I need to write ExampleBlock1, ExampleBlock2… etc since they have quant and dequant stubs inside or do I just need to write one ExampleBlock and called them like in your code?
I think you can get away with initializing multiple instances of the same ExampleBlock, as long as that serves your purpose. Each instance will be identical. I don’t think you need to create separate classes for each, based on the example you’ve provided!