Hi. I am having a hard time wrapping my head around quantizing models. To the point, I have a basic ResNet model that I want to optimize:
encoder = EncoderCNN()
encoder.load_state_dict(torch.load(os.path.join('models', encoder_file)))
encoder.eval()
encoder.to(device)
dummy_input = torch.randn(1, 3, 480, 480, device='cuda')
with torch.jit.optimized_execution(True):
encoder = torch.jit.trace(encoder, dummy_input)
encoder.save("models/encoder.pt")
encoder = torch.jit.load(os.path.join('models', 'encoder.pt'), map_location=torch.device('cuda'))
And this is how I figured I would quantize it:
if use_fbgemm:
quantization_config = torch.quantization.get_default_qconfig('fbgemm')
torch.backends.quantized.engine = 'fbgemm'
else:
quantization_config = torch.quantization.get_default_qconfig('qnnpack')
torch.backends.quantized.engine = 'qnnpack'
quantization_config.quant_min = 0.0
quantization_config.quant_max = 1.0
encoder.qconfig = quantization_config
torch.quantization.prepare(encoder, inplace=True)
torch.quantization.convert(encoder, inplace=True) # This line gets a warning
But the last line throws the warning
UserWarning: Please use quant_min and quant_max to specify the range for observers. reduce_range will be deprecated in a future release of PyTorch."
This is located in torch\ao\quantization\observer.py:216
and the following lines don’t help:
quantization_config.quant_min = 0.0
quantization_config.quant_max = 1.0
The doc on this was not very clear to me, and I tried to change it to:
torch.quantization.prepare(encoder, inplace=True)
torch.quantization.convert(encoder, inplace=True,
convert_custom_config_dict={'_custom_module_class':
{'EncoderCNN': encoder} # I want this as my Custom Module?
}
)
But the warning persists. And perhaps more importantly, I am also not making my model any faster judging by FPS, so I suppose I have set this up wrong to begin with.
If you could point me to any docs or examples that go through something like this, I would be thankful.
I know there already are quantized ResNet models available, but it’s important for me that I can apply this to a custom network with modified layers.
Thank you