How to implement fp16 quantization on CPU

Relissc · August 2, 2023, 11:13am

In Torch.fx quantization using Pytorch, the float16 data type only supports running in GPU, but it seems that GPU cannot support int8 quantization through experiments. Therefore, it is hoped that the PETR model FP16+INT8 quantization can be implemented in GPU (X86) devices. Is this solution feasible?

supriyar · August 4, 2023, 6:58pm

yes, its feasible but currently not a part of PyTorch. We are working on a prototype solution for int8 GPU dynamic quantization and will share an update shortly.
cc @cdhernandez

Relissc · August 8, 2023, 1:54am

Thank you very much for your answer, it has been very helpful to me. Can I use a custom method to quantify fp16 on the CPU? Or use other data types to quantify on the CPU, such as torch.bfloat16?