Inference with HalfTensor for speed up

wlike · August 3, 2017, 7:11am

Can we first train a model using default torch.Tensor, which is torch.FloatTensor,
and convert it to torch.HalfTensor for inference?

Or can we directly use torch.HalfTensor for training and inference?

alexis-jacq · August 3, 2017, 7:21am

You can change the nature of your tensor when you want, using my_tensor.half() or my_tensor.float(), my instincts would tell me to use the whole network with floats and to just change the output into half at the very last time in order to compute the loss.

wlike · August 3, 2017, 8:35am

we can use model.half() to convert model’s parameters and internal buffers
to half precision. But we do NOT get significant improvement as expected
to be 2~4x faster by using HalfTensor.

Elias_Vansteenkiste · November 7, 2017, 3:35pm

Hi wlike,

Did you figure out why you didn’t get a speedup? Which Nvidia GPU cards did you use? Did they support float16 like the V100 or P100 or was it the 1080(ti) or Titan which does not support fast float16?
How was your GPU memory consumption when you changed to HalfTensor?

Thanks in advance

Tgaaly · November 28, 2018, 11:20pm

thanks for sharing this! this is valuable info!

Edwardmark · January 9, 2020, 6:22am

We did not see any speed up, how about you?

Edwardmark · January 10, 2020, 6:56am

I am using 2080ti, but cannot see any improvements when changing from fp32 to fp16 when do inference with batch_size 1. Why is that? And how can I speed up the inference speed? The gpu usage is reduced from 1905MB to 1491MB anyway. Thanks.

ptrblck · January 10, 2020, 8:27am

Answered here.

kwanUm · September 20, 2020, 9:03am

Did not observe speedup. Actually inference has slowed down for me. Using V100 GPU and running a WaveGAN architecture.