In Torch.fx quantization using Pytorch, the float16 data type only supports running in GPU, but it seems that GPU cannot support int8 quantization through experiments. Therefore, it is hoped that the PETR model FP16+INT8 quantization can be implemented in GPU (X86) devices. Is this solution feasible?
yes, its feasible but currently not a part of PyTorch. We are working on a prototype solution for int8 GPU dynamic quantization and will share an update shortly.
Thank you very much for your answer, it has been very helpful to me. Can I use a custom method to quantify fp16 on the CPU? Or use other data types to quantify on the CPU, such as torch.bfloat16?