- PyTorch version: 1.8.1
I’ve applied static PTQ by changing the source code of Swin Transformer from mmclassification (inserting de/quant stubs). I’ve had to change the source code of a lot of other files too, because the model is not exactly standard or simple.
I used this function to perform the quantization of a loaded model:
def static_quantize(m, data_loader): backend = 'qnnpack' torch.backends.quantized.engine = backend m.eval() m.qconfig = torch.quantization.get_default_qconfig(backend) torch.quantization.prepare(m, inplace=True) with torch.no_grad(): for i, data in enumerate(data_loader): result = m(return_loss=False, **data) if i > 100: break torch.quantization.convert(m, inplace=True) return m # I realize this is unnecessary
However I’ve noticed a significant drop in accuracy (around 30%). To combat this I would like to selectively quantize layers, i.e. skip the quantization process for certain layers that are problematic. I’ve noticed that
convert effectively quantize everything they can in the model recursively.
For me the simplest way of doing this would be to comment out the de/quant ops in the model source code. Of course this doesn’t actually work because these stubs aren’t used to detect which layers should be quantized.
So how can I tell
prepare which layers to skip? Furthermore how can I tell it to skip one
Linear layer, but quantize some other
Linear layer (if type-level granularity is not enough)?