How to debug RuntimeError: CUDNN_STATUS_NOT_SUPPORTED for conv_transpose2d

Elias_Vansteenkiste · June 23, 2017, 11:02am

I want to build a autoencoder with the encoder part based on resnets.

When I run my model I get the following error message:

Traceback (most recent call last):
File "train_pytorch_ae.py", line 129, in <module>
preds, reconstruction, feats = model.l_out(inputs)
File "/home/eavsteen/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/eavsteen/planet/configs_pytorch/vae_res_1-3.py", line 384, in forward
bc, reconstruction, feats = self.resaenet(x)
File "/home/eavsteen/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/eavsteen/planet/configs_pytorch/vae_res_1-3.py", line 357, in forward
x = self.layer5(x)
File "/home/eavsteen/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/eavsteen/.local/lib/python2.7/site-packages/torch/nn/modules/container.py", line 64, in forward
input = module(input)
File "/home/eavsteen/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/eavsteen/planet/configs_pytorch/vae_res_1-3.py", line 227, in forward
out = self.deconv2(out)
File "/home/eavsteen/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/eavsteen/.local/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 524, in forward
output_padding, self.groups, self.dilation)
File "/home/eavsteen/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 131, in conv_transpose2d
return f(input, weight, bias) 
RuntimeError: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you 
passed in a non-contiguous input.

I built a simple example to test out conv_transpose2d and it works fine:

import torch
import torch.nn as nn
from torch.autograd import Variable

conv = nn.Conv2d(512,512, kernel_size=3, padding=1, stride = 2, bias=False)
conv = conv.cuda()
deconv = nn.ConvTranspose2d(512,512, kernel_size=3, padding=1, output_padding=1, stride=2, bias=False)
deconv = deconv.cuda()

x = Variable(torch.randn(16, 512, 16, 16).cuda(), requires_grad=True)

out = conv(x)
temp = out.cpu().data.numpy()
print 'after conv'
print temp.shape
out = deconv(out)
temp = out.cpu().data.numpy()
print 'after deconv'
print temp.shape
err = out.sum()
err.backward()

It nicely outputs the shapes as expected:

after conv
(16, 512, 8, 8)
after deconv
(16, 512, 16, 16)

So I tried to print out the shape of the tensor in the more complicated model before it goes into the conv_transpose2d and I get (16, 512, 8, 8) as expected, but I still get the RuntimeError.

How can I debug this problem? What does non-contiguous input actually mean?

Ubuntu 16, cuda 8, i tried both cudnn 6.0 and cudnn 5.1.

Some extra information:
The runtime error happens at the first deconvolution layer of the decoder stage, with the following definition:

self.deconv = nn.ConvTranspose2d(planes, planes, kernel_size=3, stride=stride,
                                 padding=1, output_padding=1, bias=False)

I also tested if cudnn is loaded correctly:

import torch
print(torch.backends.cudnn.is_acceptable(torch.cuda.FloatTensor(1)))
print(torch.backends.cudnn.version())

output:

True
6021

smth · July 7, 2017, 3:34am

Hi Elias,

Sorry for the late reply. Is there any chance I can get access to your full script? I want to reproduce and fix this problem.