From source installation of QUADRO RTX6000 slower then conda installation

Hi all,

We recently got new server nodes with QUADRO RTX6000 GPUs and I have some problems of getting pytorch to it’s top speed on them. Weirdly the from source installation is partially slower then the conda installations. I benchmarked one forward + backward pass of a few stacks of 2d and 3d convolutions. For the 2d convolutions I could see that for both with and without the use of amp the performance was twice as slow using from source installation (CUDA 11.1, cudnn 8.0.4, pytorch 1.8) compared to the conda installation (cudatoolkit=11.0, pytorch=1.7).

I could compare the running times against my workstation that has a GEFORCE RTX 2080 ti. The 2d convolutions of the conda installation of the quadro card are a bit faster then the from source installation using the geforce card. This is to be expected as they that the same architecture and the quardo has slightly more cuda kernels.

However I mostly use 3d convolutions and amp in my research. Here the from source installation was the fastest on both cards, but the quadro is 2.5 times slower then the geforce card.

When I compare conda installations on both cards I can see that the quadro is constantly a bit faster then the geforce card, just the from source installation on the quadro is why to slow.

Does anyone have an idea what the reason behind this is? Is it possible that something is wrong with the cuda compiler? We had problems with compiling cuda code on an other end. As this is a departmental machine we’re having a range of different cuda modules.

Thanks for the help!

The 1.7.1 binaries use cudnn8.0.5.39, which I assume is the reason for the main difference between your setup and the binaries.
Try to install this particular version for a proper comparison or the latest one (cudnn8.1).

Thanks! In the meanwhile I figured out that it is not the version of cudnn, but that my from source installation didn’t recognize cudnn. torch.backends.cudnn.is_available() is False for all my attempts of in installing torch from source and True for the conda installations.
I installed pytorch from source by creating a new conda environment with python3.8, loading a cuda-11.0.2 module following the steps in the installation instructions I exported CUDA_HOME, CUDNN_LIB_DIR, CUDNN_LIBRARY and CUDNN_INCLUDE before executing “python setup.py install”. The installation works and cuda is available, but cudnn not. Any ideas how to fix it?

Did you follow the cudnn install guideInstallation Guide :: NVIDIA Deep Learning cuDNN Documentation? If so, could you download the cudnn samples and try to build and execute them?
If PyTorch isn’t able to find cudnn, I guess the installation wasn’t successful.

I couldn’t go through the full installation from your link as it requires root access. Is there a way to tell torch where the cudnn library is after import?
I think the solution you suggest requires a local installation of cuda directly on that machine. The system administrators would prefer us working with cuda/cudnn modules instead.
EDIT: if it is not possible to install pytorch from source using cuda cudnn modules and having it recognise cudnn, which are the versions of CUDA and cuDNN you would suggest using for an up to date from source installation? The newest that are compatible with torch?
Thanks for the great help :slight_smile:

I’m unsure what you mean by CUDA and cudnn “modules”. A source build would require a local CUDA toolkit installation including the compiler (nvcc).
Yes, the latest CUDA and cudnn versions are compatible with the current PyTorch master branch, but note some issues we’ve found, such as:

While it seems to be a CUDA11.2 compiler issue, we’ve reproduced an illegal memory access internally caused in the multi_tensor_apply implementation, which is now visible by using more aggressive optimizations in the compiler.

Sorry for the delay in getting back. With modules I mean environment modules:
http://modules.sourceforge.net/
I could find out that the reason the from source installation with slower was indeed that it didn’t use cuDNN. I could download cuDNN and set the respective pathes (CUDNN_LIBRARY and CUDNN_INCLUDE_DIR) before executing setup.py. However when loading an environment module and exporting the CUDA_HOME pytorch was not finding a cudnn file after installation even with the cudnn pathes set. When using a local installation of cuda (11.0 in our case) we didn’t have this problem.
Anyways we now have the computational speed we want, just the from source installation with cuda modules doesn’t seem to work.
Thanks for the help!

EDIT: I managed to install pytorch from source using only environment modules. One obviously needs to load the same environment modules again before using torch, but also during installation one has to set CUDNN_ROOT to point to the cudnn folder containing “include” and “lib64”. Maybe this variable should be added to the setup.py?

By default the cudnn installation would be located /usr/local/cuda/include/ and /usr/local/cuda/lib64/, which are the standard CUDA paths (or would be set via CUDA_HOME). In case your cudnn installation uses a different path, you would most likely need to set the cudnn path manually. In the former case, the manual setting of CUDNN_ROOT is not needed.