To work with a remote GPU server running on CentOS 6.9, I had to install pytorch from source (the packaged version as an incompatibility with glibc). I ran 20 batches of my training process with the autograd profiler, and I looked at the trace with chrome://tracing. My local computer, that has a GeForce 1080 Ti, processes a batch 10x faster than the remote server does (which uses one Tesla K80 GPU). The GPUs don’t have the same specs, but I would expect only a 2x slowdown for the Tesla.
I singled out the backward conv operation which is extremely slow on the server. Weirdly, the trace does not show the same function name. Here is the comparison:
Tesla K80 | ThnnConv2DBackward | 62 ms
GeForce 1080 Ti | CudnnConvolutionBackward | 0.0000037 ms
From the function name, one could deduce that CUDA is not used on the server. But nvidia-smi shows 75-99% utilization during the process. Have I a problem with my pytorch installation?
If cuDNN isn’t detected, you could try to specify CUDNN_LIB_DIR (libcudnn.so or something similar should be in the dir) and CUDNN_INCLUDE_DIR (cudnn.h should be in the dir).