Is there fp8 inference in Pytorch

111179 · May 17, 2021, 3:14pm

Similar to fp16 inference in pytorch framework like [Training With Mixed Precision :: NVIDIA Deep Learning Performance Documentation](Training With Mixed Precision :: NVIDIA Deep Learning Performance Documentation), is there any framework about fp8 inference in pytorch?

Thank you for your time.

HDCharles · May 17, 2021, 7:45pm

To my knowledge, PyTorch’s mixed precision support (Automatic Mixed Precision package - torch.cuda.amp — PyTorch 1.8.1 documentation) does not handle fp8 either.

For 8 bit precision you’d need to look towards quantization to integers or fake quants but that doesn’t really fall under the umbrella of mixed precision, though I’m not sure if that’s core to your request or just ancillary.

For info about quantization, you can see this: Quantization — PyTorch 1.8.1 documentation

111179 · May 18, 2021, 1:06am

I see. Thank you so much