I have successfully trained a model using Nvidia apex mixed precision plugin and saved weights as .pth
I used optmization level O3. So I expect my model weights should be saved as FP16 weights.
However, I see the size of .pth file is same as that of a FP32 trained model.
Could you please help me understand this?
Also, I haven’t timed the inference, but can I expect the inference to be faster than FP32?
My idea behind this is - I wish to use Tensorrt FP16, default path is train FP32, convert/ quantize to FP16 using torch-tensorrt. However, I see some performance degradation. Hence, may be if I train natively in FP16 and then do the conversion, it should give me better results.
Is this assumption sensible?
Please let me know.