I apologize if I’ve missed something obvious here - this question relates to issues I am having timing mixed precision vs float32 computation.
I have two servers - one with Pytorch 1.5 and Cuda 10.1, and the other with Pytorch 1.6 and Cuda 11.0. As far as I know there are no Pytorch CUDA 11.0 binaries, so that pytorch was compiled with 10.1. Both have 2080 Ti RTX GPU cards.
In both servers, I time pure fp32 computation as being significantly faster than mixed precision and I can’t work out why.
I am aware that for the Pytorch1.6 server the mismatch between CUDA versions is not ideal, but I’m still not sure why there should be an issue on the Pytorch 1.5 server.
My times are as follows:
I’ve attached gists for the two scripts that I’m using to compute times below.
Output of nvcc --version
Based on this comment/thread:
I would expect there to be a speedup on a 2080 Ti - is this correct?