32 float weight convert 16 float model?

Hello. I got my trained model with a good segmentation result.
However, this is still little bit slow.
I am wondering, is there any way I can convert this model to another type for speed?
My model is Mobilenet v3 base small version model.
I see that there is half-precision or 16 float convert that can make the model faster than the pure 32 float model.
My GPU is titan V.
Thank you.

1 Like

Hi

If you want to truncate/reduce precision the weights of the trained model, you can do

net = Model()
net.half()

which converts all FP32 tensor to FP16 tensor.

2 Likes

Thank you I will try. Do you think this can reduce the inference time?

This can reduce the inference time, if TensorCores are used for the workload.
However, note that FP16 can easily overflow such that your model might return NaNs, if you are not careful (especially during training).
We thus recommend to use automatic mixed precision training, which can be used in the nightly binaries or from a master build.

Nevertheless, you could try to directly call half() and see, if the inference workload suffers from these issues.

4 Likes

@ptrblck would converting the model to half-precision reduce run time inference on CPU as well?

I don’t know what the current support of half-precision on the CPU is so let’s wait for some CPU experts.

Henry, did that technique work for your model ??