Inference with HalfTensor for speed up

Can we first train a model using default torch.Tensor, which is torch.FloatTensor,
and convert it to torch.HalfTensor for inference?

Or can we directly use torch.HalfTensor for training and inference?


You can change the nature of your tensor when you want, using my_tensor.half() or my_tensor.float(), my instincts would tell me to use the whole network with floats and to just change the output into half at the very last time in order to compute the loss.

we can use model.half() to convert model’s parameters and internal buffers
to half precision. But we do NOT get significant improvement as expected
to be 2~4x faster by using HalfTensor.


Hi wlike,

Did you figure out why you didn’t get a speedup? Which Nvidia GPU cards did you use? Did they support float16 like the V100 or P100 or was it the 1080(ti) or Titan which does not support fast float16?
How was your GPU memory consumption when you changed to HalfTensor?

Thanks in advance


thanks for sharing this! this is valuable info! :slight_smile:

We did not see any speed up, how about you?

I am using 2080ti, but cannot see any improvements when changing from fp32 to fp16 when do inference with batch_size 1. Why is that? And how can I speed up the inference speed? The gpu usage is reduced from 1905MB to 1491MB anyway. Thanks.

Answered here.


Did not observe speedup. Actually inference has slowed down for me. Using V100 GPU and running a WaveGAN architecture.