CUDA / CUDNN basics

chilango · August 14, 2017, 8:36pm

Trying to run an lstm batch on a local GPU and I’m getting the following typical error:

/home/xxx/.conda/envs/torchenv/lib/python3.6/site-packages/torch/backends/cudnn/__init__.py:40: UserWarning: PyTorch was compiled without cuDNN support. To use cuDNN, rebuild PyTorch making sure the library is visible to the build system. "PyTorch was compiled without cuDNN support. To use cuDNN, rebuild "

These are the relevant packages in my environment:

(torchenv) [~/git/nn]$ conda list
cuda80 1.0 0 soumith
cudatoolkit 7.5 2
cudnn 6.0.21 cuda7.5_0
pytorch 0.1.12 py36cuda7.5cudnn6.0_1

In [2]: torch.backends.cudnn.is_acceptable(torch.cuda.FloatTensor(1))
`False’

In [3]: print(torch.backends.cudnn.version())
None

In [4]: print(torch.cuda.is_available())
True

So CUDA and CUDNN are available, but pytorch was built against a previous version than the one in my system. A few beginner questions that I can’t find answers for in other questions here:

Am I seeing the “compiled without cuDNN support” errors because of version mismatch between build and visible versions?
If so, what is the best way to solve this? Should I uninstall pytorch and build from source? Or is there an organic way to do this using conda?
What are some good practices to keep a local node / cluster up to date and avoid these mismatches?

Using Linux 4.12.4-1-ARCH fwiw.

Thanks!

chilango · August 14, 2017, 8:58pm

Not sure if also relevant:

(torchenv) [~/git/nn]$ nvcc --version
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

QuantScientist · August 15, 2017, 12:15pm

This is how I compiled PyTorch inside a Docker image with CUDA support:

github.com

QuantScientist/Deep-Learning-Boot-Camp/blob/master/docker/Dockerfile.gpu3

FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04


ENV CUDA_ARCH_BIN "30 35 50 52 60"
ENV CUDA_ARCH_PTX "60"

RUN rm -rf /var/lib/apt/lists/*
RUN apt-get clean

RUN apt-get update && apt-get install --no-install-recommends  -y \
    git cmake build-essential libgoogle-glog-dev libgflags-dev libeigen3-dev libopencv-dev libcppnetlib-dev libboost-dev libboost-all-dev libboost-iostreams-dev libcurl4-openssl-dev protobuf-compiler libopenblas-dev libhdf5-dev libprotobuf-dev libleveldb-dev libsnappy-dev liblmdb-dev libutfcpp-dev wget unzip  \
    python \
    python-dev \
    python2.7-dev \
    python3-dev \
    python-virtualenv \
    python-wheel \
	python-tk \
    pkg-config \
    libopenblas-base \

This file has been truncated. show original

QuantScientist · August 15, 2017, 12:16pm

Also try this:

github.com

QuantScientist/Deep-Learning-Boot-Camp/blob/master/docker/deps_nvidia_docker.sh

#!/usr/bin/env bash

apt-get install nvidia-modprobe

# curl -O -s https://raw.githubusercontent.com/minimaxir/keras-cntk-docker/master/deps_nvidia_docker.sh
if lspci | grep -i 'nvidia'
then
  echo "\nNVIDIA GPU is likely present."
else
  echo "\nNo NVIDIA GPU was detected. Exiting ...\n"
  exit 1
fi

echo "\nChecking for NVIDIA drivers ..."      
# Check for CUDA and try to install.
if ! dpkg-query -W cuda; then
  # The 16.04 installer works with 16.10.
  curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
  dpkg -i ./cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
  apt-get update

This file has been truncated. show original

chilango · August 15, 2017, 7:38pm

Thanks for the list of packages. I upgraded to 0.2.0 and now it automagically works. Thanks for fixing whatever bug this was to the dev team.

Savitha · October 28, 2017, 11:37am

I am facing same issue but when I am trying to update pytorch to 0.2.0 its updated but when I checked in code its showing that its using pytorch 0.1.12
issue:

my pytorch-gpu version 0.1.12
pytorch version 0.2.0

is it using pytorch-gpu? if so I uninstalled pytorch-gpu then its giving me that no module torch.
how can i fix this issue.