Quantization in inference

kobybibas · July 17, 2018, 2:00pm

Hi,

I’m trying to use half-precision when evaluating full precision pretrained model simply by model.half()
I get major drop in accuracy.
Is it possible to use half-precision only in the inference?
Maybe some scaling need to be done?

Thanks