Model weight size and inference time for fp16 model using apex mixed precision with optimization level O3

SM19 · September 24, 2022, 8:33am

Hi,

I have successfully trained a model using Nvidia apex mixed precision plugin and saved weights as .pth
I used optmization level O3. So I expect my model weights should be saved as FP16 weights.
However, I see the size of .pth file is same as that of a FP32 trained model.

Could you please help me understand this?

Also, I haven’t timed the inference, but can I expect the inference to be faster than FP32?

My idea behind this is - I wish to use Tensorrt FP16, default path is train FP32, convert/ quantize to FP16 using torch-tensorrt. However, I see some performance degradation. Hence, may be if I train natively in FP16 and then do the conversion, it should give me better results.

Is this assumption sensible?

Please let me know.

Best Regards

ptrblck · September 24, 2022, 7:09pm

apex.amp is deprecated and the O3 opt_level was used for “pure” FP16 training, which you could achieve by directly calling .half() on the model and input data without using apex.amp.