I have successfully called torch.ao.quantization.convert properly. However, I thought the converted model would transform my nn.layer and nn.Conv2d with the weight as int8 datatype. Instead, my layers are QuantizeLinear and QuantizeConv.
see Quantization — PyTorch 2.1 documentation, in particular,
Convert the observed model to a quantized model. This does several things:
quantizes the weights, computes and stores the scale and bias value to be
used with each activation tensor, and replaces key operators with quantized
implementations.
model_int8 = torch.quantization.convert(model_fp32_prepared)