- PyTorch version: 1.8.1
I’ve applied static PTQ by changing the source code of Swin Transformer from mmclassification (inserting de/quant stubs). I’ve had to change the source code of a lot of other files too, because the model is not exactly standard or simple.
I used this function to perform the quantization of a loaded model:
def static_quantize(m, data_loader):
backend = 'qnnpack'
torch.backends.quantized.engine = backend
m.eval()
m.qconfig = torch.quantization.get_default_qconfig(backend)
torch.quantization.prepare(m, inplace=True)
with torch.no_grad():
for i, data in enumerate(data_loader):
result = m(return_loss=False, **data)
if i > 100:
break
torch.quantization.convert(m, inplace=True)
return m # I realize this is unnecessary
However I’ve noticed a significant drop in accuracy (around 30%). To combat this I would like to selectively quantize layers, i.e. skip the quantization process for certain layers that are problematic. I’ve noticed that prepare
and convert
effectively quantize everything they can in the model recursively.
For me the simplest way of doing this would be to comment out the de/quant ops in the model source code. Of course this doesn’t actually work because these stubs aren’t used to detect which layers should be quantized.
So how can I tell prepare
which layers to skip? Furthermore how can I tell it to skip one Linear
layer, but quantize some other Linear
layer (if type-level granularity is not enough)?