Is there fp8 inference in Pytorch

Similar to fp16 inference in pytorch framework like [Training With Mixed Precision :: NVIDIA Deep Learning Performance Documentation](Training With Mixed Precision :: NVIDIA Deep Learning Performance Documentation), is there any framework about fp8 inference in pytorch?

Thank you for your time.

1 Like

To my knowledge, PyTorch’s mixed precision support (Automatic Mixed Precision package - torch.cuda.amp — PyTorch 1.8.1 documentation) does not handle fp8 either.

For 8 bit precision you’d need to look towards quantization to integers or fake quants but that doesn’t really fall under the umbrella of mixed precision, though I’m not sure if that’s core to your request or just ancillary.

For info about quantization, you can see this: Quantization — PyTorch 1.8.1 documentation

I see. Thank you so much