Applying half-precision

Hi PyTorchers!
It would sound a bit like nonsense but,
Is it possible to apply half-precision(FP16) to the model, which is trained with full-precision (FP32), for inference only?

or do I need to train new model with half-precision from the scratch?

There is a whole area of model compression to experiment such things.
Have you tried it and saw whats the performance of your model?

I have seen researchers trying half-precision <–> binary. Ofcourse they have their own method training.

But full-precision <–> half precision, I would not expect much difference. But please update here if you have done the experiments.