CUDNN_STATUS_NOT_SUPPORTED for large matrix input

nikostr · December 10, 2017, 6:40pm

The following code (adapted from CUDNN_STATUS_NOT_SUPPORTED error occurs when apply autograd.grad to compute high-order differentiation) gives me a CUDNN_STATUS_NOT_SUPPORTED error. I’m running the pytorch 0.3.0 with cuDNN 7.0.4, CUDA 9.0.176, and python 3.6.

from torch.autograd import Variable, grad
import torch.utils.data as Data

class TestDataset(Data.Dataset):

    def __init__(self):
        self.sequences = []
        PROBLEM_SIZE = 171 * 21
        data = torch.rand(1,PROBLEM_SIZE,PROBLEM_SIZE)
        label = torch.rand(PROBLEM_SIZE,PROBLEM_SIZE).round()
        self.sequences.append((data,label))

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self,idx):
        return self.sequences[idx]

train_data = TestDataset()

train_loader = Data.DataLoader(
    dataset=train_data, batch_size=1, shuffle=True, num_workers=1)


class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 1, kernel_size=3, stride=1, dilation=1, padding=1,bias=False)
        self.conv2 = nn.Conv2d(1, 1, kernel_size=3, stride=1, dilation=2, padding=2,bias=False)

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        return x


cnn = CNN()
cnn.cuda()

loss_func = nn.BCEWithLogitsLoss()

for step, (data, label) in enumerate(train_loader):
    input = Variable(data).cuda()
    target = Variable(label).cuda()

    output = cnn(input)[0]
    loss = loss_func(output, target)

    params = cnn.parameters()
    g = grad(loss, params, create_graph=True)

    g_sum = 0
    for g_para in g:
        g_sum += g_para.sum()

    params = cnn.parameters()
    hv = grad(g_sum, params, create_graph=True)

    break

Running the convolutions individually works, as does running this code for smaller input matrices. I’m pretty sure the problem also occurs for sizes other than 3591 by 3591. The error I receive is the following:

  File "/home/n/nikostr/pfs/CompBio-DD2402/experiments/20171209/mwe.py", line 54, in <module>
    g = grad(loss, params, create_graph=True)
  File "/home/n/nikostr/pfs/.conda/envs/myroot/lib/python3.6/site-packages/torch/autograd/__init__.py", line 158, in grad
    inputs, only_inputs, allow_unused)
RuntimeError: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

Is anyone able to reproduce this, or is the problem on my end?

richard · December 11, 2017, 12:05am

I’ve been able to reproduce this on 0.3 and on master. Could be a bug.

nikostr · December 11, 2017, 8:00am

Thank you! I’ve submitted this as an issue at:

rnn_lstm · August 2, 2018, 2:53pm

Has this been solved in v0.4 ?