Quantization config

Hi. I am having a hard time wrapping my head around quantizing models. To the point, I have a basic ResNet model that I want to optimize:

encoder = EncoderCNN()
encoder.load_state_dict(torch.load(os.path.join('models', encoder_file)))

encoder.eval()
encoder.to(device)

dummy_input = torch.randn(1, 3, 480, 480, device='cuda')
with torch.jit.optimized_execution(True):
    encoder = torch.jit.trace(encoder, dummy_input)
    encoder.save("models/encoder.pt")

encoder = torch.jit.load(os.path.join('models', 'encoder.pt'), map_location=torch.device('cuda'))

And this is how I figured I would quantize it:

if use_fbgemm:
    quantization_config = torch.quantization.get_default_qconfig('fbgemm')
    torch.backends.quantized.engine = 'fbgemm'

else:
    quantization_config = torch.quantization.get_default_qconfig('qnnpack')
    torch.backends.quantized.engine = 'qnnpack'

quantization_config.quant_min = 0.0
quantization_config.quant_max = 1.0
encoder.qconfig = quantization_config

torch.quantization.prepare(encoder, inplace=True)
torch.quantization.convert(encoder, inplace=True)   # This line gets a warning

But the last line throws the warning

UserWarning: Please use quant_min and quant_max to specify the range for observers. reduce_range will be deprecated in a future release of PyTorch."

This is located in torch\ao\quantization\observer.py:216 and the following lines don’t help:

quantization_config.quant_min = 0.0
quantization_config.quant_max = 1.0

The doc on this was not very clear to me, and I tried to change it to:

torch.quantization.prepare(encoder, inplace=True)
torch.quantization.convert(encoder, inplace=True,
                           convert_custom_config_dict={'_custom_module_class':
                                                           {'EncoderCNN': encoder}  # I want this as my Custom Module?
                                                       }
                           )

But the warning persists. And perhaps more importantly, I am also not making my model any faster judging by FPS, so I suppose I have set this up wrong to begin with.

If you could point me to any docs or examples that go through something like this, I would be thankful.

I know there already are quantized ResNet models available, but it’s important for me that I can apply this to a custom network with modified layers.

Thank you