Error when using CUDNN

I have the following error. I have set my cudnn in some path and set the $LD_LIBRARY_PATH with:

 export LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH

How can I find and solve the problem.

Traceback (most recent call last):
  File "main.py", line 157, in <module>
    train()
  File "main.py", line 131, in train
    output, hidden = model(data, hidden)
  File "/data/disk1/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/disk1/workbench/learn/pytorch/examples/word_language_model/model.py", line 28, in forward
    output, hidden = self.rnn(emb, hidden)
  File "/data/disk1/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/disk1/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/nn/modules/rnn.py", line 81, in forward
    return func(input, self.all_weights, hx)
  File "/data/disk1/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/nn/_functions/rnn.py", line 235, in forward
    return func(input, *fargs, **fkwargs)
  File "/data/disk1/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/autograd/function.py", line 201, in _do_forward
    flat_output = super(NestedIOFunction, self)._do_forward(*flat_input)
  File "/data/disk1/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/autograd/function.py", line 223, in forward
    result = self.forward_extended(*nested_tensors)
  File "/data/disk1/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/nn/_functions/rnn.py", line 180, in forward_extended
    cudnn.rnn.forward(self, input, hx, weight, output, hy)
  File "/data/disk1/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/backends/cudnn/rnn.py", line 184, in forward
    handle = cudnn.get_handle()
  File "/data/disk1/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/backends/cudnn/__init__.py", line 337, in get_handle
    handle = CuDNNHandle()
  File "/data/disk1/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/backends/cudnn/__init__.py", line 128, in __init__
    check_error(lib.cudnnCreate(ctypes.byref(ptr)))
  File "/data/disk1/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/backends/cudnn/__init__.py", line 324, in check_error
    raise CuDNNError(status)
torch.backends.cudnn.CuDNNError: 6: CUDNN_STATUS_ARCH_MISMATCH
Exception ctypes.ArgumentError: "argument 1: <type 'exceptions.TypeError'>: Don't know how to convert parameter 1" in <bound method CuDNNHandle.__del__ of <torch.backends.cudnn.CuDNNHandle instance at 0x7fa7707dd5f0>> ignored
1 Like

are you sure you have the correct cudnn version? it needs to be R5 or R6

I use cudnn cuDNN v5 (May 12, 2016), for CUDA 7.5.
my cuda version is :

nvcc  --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

How can I find more details about the cudnn?

Can you please run torch.backends.cudnn.version()?

Hi @apaszke , I have got the following Error:

torch.backends.cudnn.version()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-de1bb2d5285f> in <module>()
----> 1 torch.backends.cudnn.version()

/global-hadoop/home/ckyn/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/backends/cudnn/__init__.pyc in version()
     73 def version():
     74     if not lib:
---> 75         raise RuntimeError("cuDNN not initialized")
     76     if len(__cudnn_version) == 0:
     77         __cudnn_version.append(lib.cudnnGetVersion())

RuntimeError: cuDNN not initialized

Since I don’t have root priveledge, I copied the system cuda floder into my own place and set the CUDA_ROOT and CUDA_HOME variable to the path. Afterwards, I copied cudnn file into the path following this answer(http://stackoverflow.com/questions/39262468/installing-cudnn-for-theano-without-root-access). Is there any suggestion I can take for this situation?

1 Like

Right, sorry. Do this please:

print(torch.backends.cudnn.is_acceptable(torch.cuda.FloatTensor(1)))
print(torch.backends.cudnn.version())
2 Likes

Hi, @apaszke, I have tried the code and I got:

In [1]: import torch

In [2]: print(torch.backends.cudnn.is_acceptable(torch.cuda.FloatTensor(1)))
   ...: print(torch.backends.cudnn.version())
   ...:
True
5005

Now the error code change to:

torch.backends.cudnn.CuDNNError: 6: CUDNN_STATUS_ARCH_MISMATCH
Exception ctypes.ArgumentError: "argument 1: <type 'exceptions.TypeError'>: Don't know how to convert parameter 1" in <bound method CuDNNHandle.__del__ of <torch.backends.cudnn.CuDNNHandle instance at 0x7f4b9099a320>> ignored

I just found that my gpu is Tesla M2075, I searched a similar issue in caffe, saying that cudnn require higher version than pure cuda. Is it not supported in Tesla? Can I run the sample code with only cuda instead of cudnn?

1 Like

M2075 is Fermi architecture card, cudnn is not supported on it. You can disable cudnn by setting torch.backend.cudnn.enabled=False. But you can expect only very modest speed-ups with such an old card.

@ngimel, Thanks for your help. However, another problem encountered.

THCudaCheck FAIL file=/data/users/soumith/miniconda2/conda-bld/pytorch-0.1.9_1487343590888/work/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu line=246 error=8 : invalid device function
Traceback (most recent call last):
  File "main.py", line 157, in <module>
    train()
  File "main.py", line 131, in train
    output, hidden = model(data, hidden)
  File "/data/disk1/ckyn/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/disk1/ckyn/workbench/learn/pytorch/examples/word_language_model/model.py", line 28, in forward
    output, hidden = self.rnn(emb, hidden)
  File "/data/disk1/ckyn/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
ons/rnn.py", line 138, in forward
    nexth, output = func(input, hidden, weight)
  File "/data/disk1/ckyn/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/nn/_functions/rnn.py", line 67, in forward
    hy, output = inner(input, hidden[l], weight[l])
  File "/data/disk1/ckyn/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/nn/_functions/rnn.py", line 96, in forward
    hidden = inner(input[i], hidden, *weight)
  File "/data/disk1/ckyn/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/nn/_functions/rnn.py", line 22, in LSTMCell
    gates = F.linear(input, w_ih, b_ih) + F.linear(hx, w_hh, b_hh)
  File "/data/disk1/ckyn/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 752, in __add__
    return self.add(other)
  File "/data/disk1/ckyn/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 292, in add
    return self._add(other, False)
  File "/data/disk1/ckyn/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 286, in _add
    return Add(inplace)(self, other)
  File "/data/disk1/ckyn/ProgramFiles/anaconda2/lib/python2.7/site-packages/torch/autograd/_functions/basic_ops.py", line 13, in forward
    return a.add(b)
RuntimeError: cuda runtime error (8) : invalid device function at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.9_1487343590888/work/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:246

Is there any idea about this?

Pytorch binaries are not built for your architecture:
https://github.com/pytorch/builder/blob/master/conda/pytorch-0.1.9/build.sh#L5
(yours is 2.0). Try compiling from source, but that too may fail as your card is very old. Even if it does not fail, still expect very small speed-ups (if any).

I get a warning message on cudnn when run an official example mnist.py. The message is

“/usr/local/lib/python3.5/dist-packages/torch/backends/cudnn/init.py:57: UserWarning: cuDNN library not found. Check your LD_LIBRARY_PATH
}.get(sys.platform, ‘LD_LIBRARY_PATH’)))”

Sure, if I export cuDNN library in LD_LIBRARY_PATH so that I can get rid of this warning message.
However, in Linux like ubuntu 16.xx we use dynamic linker run-time bindings (use ldconfig to
make proper config), and usually no need to use environment variable LD_LIBRARY_PATH.
Can we make it in this way?

We can’t, because some Python bindings load cuDNN dynamically using ctypes, and it has to find it somehow. But we could save the path to the place where cuDNN was found during install.

The problem is not ctypes (it looks in ld cache) and not ld cache per se. The problem is that ld cache typically contains libname.so.MAJOR (verify this with ldconfig -p), and for cudnn pytorch tries to load libcudnn.so.MAJOR.MINOR.PATCH. Try adding libcudnn.so.MAJOR.MINOR.PATCH to your ld cache (ldconfig -l may be?)

Thanks for pointing out this. I tried “ldconfig -l /usr/local/cuda-8.0/lib64/libcudnn.so.5.1.5”,
it seems doesn’t work (/etc/ld.so.cache doesn’t change), though no error message.
“man ldconfig” doesn’t give detail or example usage for option -l, and it said
"Intended for use by experts only". So, I’m not expert (indeed). :frowning:

HI, @apaszke
There is cudnn5.0 lib on my PC, however I got warning:

UserWarning: PyTorch was compiled without cuDNN support. To use cuDNN, rebuild PyTorch making sure the library is visible to the build system.
  "PyTorch was compiled without cuDNN support. To use cuDNN, rebuild "

How to build pytorch with cuDNN support?
cudnn.h is in /usr/local/cuda-8.0/include/cudnnv5/ and cudnn.so.5 is in /usr/local/cuda-8.0/lib64/cuDNNv5/. The path has been added in system environment variable.

export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64/cuDNNv5:$LD_LIBRARY_PATH

I build the pytorch from source code.

cd pytorch-root/ & python setup.py install

Hello , I meet the same problem , can you tell me how to handle this?