Can't get any inference speed up using FP16 in 2080ti compared with fp32

Edwardmark · January 10, 2020, 6:53am

I convert a model trained in FP32, and I use model.half() and input.half() to do inference in fp16 precision, but the inference speed is almost same with fp32 in 2080ti, my batch size is fixed to 1, why is that? How can I speed up using fp16? Thanks.

ptrblck · January 10, 2020, 8:26am

Answered here.