Is it a good idea to use float16/bfloat16 for inference?

Hi

I am currently using bfloat16 for inference and the model seems to be performing well (with torch.amp.autocast).
Is it a common practice to use reduce precision for inference as well as for training? I am not confident enough because most articles about reduced precision are focusing on training.

Yes, lower dtypes can be used for inference, too.

Thank you for the clear answer!