Quantization in inference

Hi,

I’m trying to use half-precision when evaluating full precision pretrained model simply by model.half()
I get major drop in accuracy.
Is it possible to use half-precision only in the inference?
Maybe some scaling need to be done?

Thanks

1 Like