So i just used packer to bake my own images for GCE and ran into the following situation.
Installed CUDA 9.0 and everything worked fine, I could train my models on the GPU.
Afte a while I noticed I forgot to install cuDNN, however it seems that pytorch does not complain about this. On an image with only CUDA installed, if I run
torch.backends.cudnn.version() I get 7102 and
torch.backends.cudnn.enabled == True
When I did install cuDNN from https://developer.nvidia.com/cudnn, everything still worked fine, I still got the same outputs for the two command above, but I didn’t get significant speedups.
Does this mean if one installs only CUDA and PyTorch, cuDNN also gets magically installed? Or is there a way how to check if pytorch is really using the speedups promised from cuDNN?
Any advice? Thanks
How did you install PyTorch?
The binaries are shipped with CUDA and cuDNN already.
I used a script like this, to install CUDA, cuDNN and Python and then used
pipenv install torch to install PyTorch. The image was based on Google Clouds “ubuntu-1604-lts”. But even if I comment out the line that installs cuDNN nothing seems to change for my PyTorch installation?
# install CUDA
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda-9-0; then
# The 16.04 installer works with 16.10.
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda-9-0 -y
# install cuDNN
sudo dpkg -i /tmp/libcudnn7_184.108.40.206-1+cuda9.0_amd64.deb
# install python
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install -y python3.6
sudo apt-get install -y python3-pip
# install pipenv
sudo pip3 install pipenv
Ok, I just found an answer by soumith on another thread:
“if you want to use pytorch with an NVIDIA GPU, all you need to do is install pytorch binaries and start using it. We ship with everything in-built (pytorch binaries include CUDA, CuDNN, NCCL, MKL, etc.).”
so that means the whole installing CUDA and cuDNN on Ubuntu shenanigans are actually not necessary at all?! That would also explain my confusion, why I get the same time to train no matter if I install cudnn or not
sorry for the confusion
Yes, you just need to install the NVIDIA drivers and the binaries will come with the other libs.
If you want to build from source, you would need to install CUDA, cuDNN etc.
If that is the case, why I encountered the the following error when importing torch:
libcudart.so.10.0: cannot open shared object file: No such file or directory
I installed pytorch1.0 binary with cuda10, and I already have cuda9.0 in my system.
if pytorch does ship with everything in built, why it can’t find something that comes with it?
Could you check your
LD_LIBRARY_PATH to see if you have some libs linking against your own
libcudart as described in this issue?