Convert back to Unquantized model

Hello. I have a question about convert in torch.quantization.

For a model like this,
(module): LeNet(
(l1): Linear(in_features=784, out_features=10, bias=True)
(relu1): ReLU(inplace=True)
)

After QAT and convert, I got
(module): LeNet(
(l1): QuantizedLinear(in_features=784, out_features=10, scale=0.5196203589439392, zero_point=78, qscheme=torch.per_channel_affine)
(relu1): QuantizedReLU(inplace=True)
)

But, I’m looking for a way to do an evaluation on CUDA, and in that sense, I need to convert it back to the pre-QAT model yet with ‘quantized FP32’ weights and perhaps custom forward_hook to perform activation quantization. Can someone advise the best way to achieve this? In my understanding, these are the steps but like to ensure I don’t reinvent the wheel here.

  • write a new converter to get the pre-QAT model architecture and load quantized weight (but, in FP32).
  • add forward_prehook that does quantization per scale/zero_point from activation_post_process
    (should it be forward_prehook or forward_posthook??)

Any suggestions would be appreciated!

Hi @thyeros, you can use the QAT model after prepare and before convert to evaluate in fp32 emulating int8. It will model the quantized numerics in fp32 with the FakeQuantize modules, and it works on CUDA. Here is an example from torchvision: https://github.com/pytorch/vision/blob/master/references/classification/train_quantization.py#L134

oh, I see. Just using the QAT model in eval will run just find on CUDA, with all int8 emulation effects. So, there is literally nothing special to do. Is that right?

In terms of eval speed on CUDA, then would it be still a good idea to drop observers? or just disabling them would be sufficient?

Thanks!

In terms of eval speed on CUDA, then would it be still a good idea to drop observers? or just disabling them would be sufficient?

You can use model.apply(torch.quantization.disable_observer) and model.apply(torch.quantization.enable_observer) to toggle them (example: https://github.com/pytorch/vision/blob/master/references/classification/train_quantization.py#L128).

Already disabling them. Great, thanks! :+1: