32 float weight convert 16 float model?

henry_Kang · July 11, 2020, 4:47am

Hello. I got my trained model with a good segmentation result.
However, this is still little bit slow.
I am wondering, is there any way I can convert this model to another type for speed?
My model is Mobilenet v3 base small version model.
I see that there is half-precision or 16 float convert that can make the model faster than the pure 32 float model.
My GPU is titan V.
Thank you.

surya00060 · July 11, 2020, 6:27am

Hi

If you want to truncate/reduce precision the weights of the trained model, you can do

net = Model()
net.half()

which converts all FP32 tensor to FP16 tensor.

henry_Kang · July 13, 2020, 7:23pm

Thank you I will try. Do you think this can reduce the inference time?

ptrblck · July 14, 2020, 10:29am

This can reduce the inference time, if TensorCores are used for the workload.
However, note that FP16 can easily overflow such that your model might return NaNs, if you are not careful (especially during training).
We thus recommend to use automatic mixed precision training, which can be used in the nightly binaries or from a master build.

Nevertheless, you could try to directly call half() and see, if the inference workload suffers from these issues.

RAJA_PARIKSHAT · January 21, 2022, 6:35pm

@ptrblck would converting the model to half-precision reduce run time inference on CPU as well?

ptrblck · January 21, 2022, 7:47pm

I don’t know what the current support of half-precision on the CPU is so let’s wait for some CPU experts.

MD_DIDARUL_ISLAM · November 5, 2022, 11:53pm

Henry, did that technique work for your model ??