Is it a good idea to use float16/bfloat16 for inference?

YuA · July 31, 2024, 6:23am

Hi

I am currently using bfloat16 for inference and the model seems to be performing well (with torch.amp.autocast).
Is it a common practice to use reduce precision for inference as well as for training? I am not confident enough because most articles about reduced precision are focusing on training.

ptrblck · July 31, 2024, 10:59am

Yes, lower dtypes can be used for inference, too.

YuA · August 1, 2024, 12:30am

Thank you for the clear answer!