Hi PyTorchers! It would sound a bit like nonsense but, Is it possible to apply half-precision(FP16) to the model, which is trained with full-precision (FP32), for inference only?
or do I need to train new model with half-precision from the scratch?
There is a whole area of model compression to experiment such things. Have you tried it and saw whats the performance of your model?
I have seen researchers trying half-precision <–> binary. Ofcourse they have their own method training.
But full-precision <–> half precision, I would not expect much difference. But please update here if you have done the experiments.