CUDNN_STATUS_NOT_INITIALIZED when using cnn

I ran into the Runtime Error: CUDNN_STATUS_NOT_INITIALIZED when I was training a model that has cnn.

I checked my installation

>>> import torch
>>> print(torch.backends.cudnn.is_acceptable(torch.cuda.FloatTensor(1)))
True
>>> print(torch.backends.cudnn.version())
7005

One thing that might cause the problem but I’m not sure:
On the server that I’m using:

(test_cudnn) zhangjuexiao@ubuntu:/data/disk1/private/zhangjuexiao/allennlp$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44
cat /usr/local/cuda-8.0/include/cudnn.h | grep CUDNN_MAJOR -A 2

#define CUDNN_MAJOR      6
#define CUDNN_MINOR      0
#define CUDNN_PATCHLEVEL 21

But I install this in my virtual environment:

conda install pytorch torchvision cuda90 -c pytorch

I’ve been tortued by it for days…
How to solve the problem? Any kind help is appreciated!

Here are the detailed trace back, if needed:

File "/data/disk1/private/zhangjuexiao/allennlp/allennlp/training/trainer.py", line 434, in _train_epoch 
loss = self._batch_loss(batch, for_training=True) File 
"/data/disk1/private/zhangjuexiao/allennlp/allennlp/training/trainer.py", line 371, in _batch_loss 
output_dict = self._model(**batch) File 
"/data/disk1/private/zhangjuexiao/anaconda3/envs/test_cudnn/lib/python3.6/site-
packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File 
"/data/disk1/private/zhangjuexiao/allennlp/allennlp/models/reading_comprehension/bidaf.py", line 
174, in forward embedded_question = self._highway_layer(self._text_field_embedder(question)) File 
"/data/disk1/private/zhangjuexiao/anaconda3/envs/test_cudnn/lib/python3.6/site-
packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File 
"/data/disk1/private/zhangjuexiao/allennlp/allennlp/modules/text_field_embedders/basic_text_field_e
mbedder.py", line 52, in forward token_vectors = embedder(tensor) File 
"/data/disk1/private/zhangjuexiao/anaconda3/envs/test_cudnn/lib/python3.6/site-
packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File 
"/data/disk1/private/zhangjuexiao/allennlp/allennlp/modules/token_embedders/token_characters_en
coder.py", line 36, in forward return self._dropout(self._encoder(self._embedding(token_characters), 
mask)) File "/data/disk1/private/zhangjuexiao/anaconda3/envs/test_cudnn/lib/python3.6/site-
packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File 
"/data/disk1/private/zhangjuexiao/allennlp/allennlp/modules/time_distributed.py", line 35, in forward 
reshaped_outputs = self._module(*reshaped_inputs) File 
"/data/disk1/private/zhangjuexiao/anaconda3/envs/test_cudnn/lib/python3.6/site-
packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File 
"/data/disk1/private/zhangjuexiao/allennlp/allennlp/modules/seq2vec_encoders/cnn_encoder.py", 
line 106, in forward self._activation(convolution_layer(tokens)).max(dim=2)[0] File 
"/data/disk1/private/zhangjuexiao/anaconda3/envs/test_cudnn/lib/python3.6/site-
packages/torch/nn/modules/module.py", line 357, in call result = self.forward(*input, **kwargs) File 
"/data/disk1/private/zhangjuexiao/anaconda3/envs/test_cudnn/lib/python3.6/site-
packages/torch/nn/modules/conv.py", line 168, in forward self.padding, self.dilation, self.groups) File 
"/data/disk1/private/zhangjuexiao/anaconda3/envs/test_cudnn/lib/python3.6/site-
packages/torch/nn/functional.py", line 54, in conv1d return f(input, weight, bias) RuntimeError: 
CUDNN_STATUS_NOT_INITIALIZED

Hi @juexZZ , any sollution yet?. I ran into the same issue while trying a batch_size > 1. My system details are:
Cuda version: 9.0
Cudnn version: 7102
Pytorch version: 0.4.0
GPU: GTX 1080 Ti
Driver version: 390.77
OS: Ubuntu 16.04

With a batch_size of 1, the training loop works fine. Detailed traceback:

<ipython-input-20-8f012bcb5dc5> in forward(self, x)
      9 
     10     def forward(self, x):
---> 11         x = self.conv3d_1(x)
     12         x = self.conv3d_2(x)
     13         x = self.conv3d_3(x)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    489             result = self._slow_forward(*input, **kwargs)
    490         else:
--> 491             result = self.forward(*input, **kwargs)
    492         for hook in self._forward_hooks.values():
    493             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py in forward(self, input)
     89     def forward(self, input):
     90         for module in self._modules.values():
---> 91             input = module(input)
     92         return input
     93 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    489             result = self._slow_forward(*input, **kwargs)
    490         else:
--> 491             result = self.forward(*input, **kwargs)
    492         for hook in self._forward_hooks.values():
    493             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py in forward(self, input)
    419     def forward(self, input):
    420         return F.conv3d(input, self.weight, self.bias, self.stride,
--> 421                         self.padding, self.dilation, self.groups)
    422 
RuntimeError: CUDNN_STATUS_NOT_INITIALIZED