cuDNN erro when using convolution layer

Hi everyone, I have an error message when I try to run a convolutional layer, the convolution layer is the following:

self.l_conv = nn.Conv2d(in_channels=self.arch['out_channels'][-1],out_channels=channels,kernel_size=3,padding=1)

Part of the error message is the following:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-23-b5acd2999739> in <module>()
     34         fake_label = torch.randint(0,num_classes,(n_batch,))
     35         fake_label = Variable(fake_label.cuda())
---> 36         G_out = netG(in_noise,fake_label)
     37 
     38         D_out_fake = netD(G_out,fake_label).squeeze()

<ipython-input-12-09ab6878f79c> in forward(self, z, y)
     65         out = self.bn(h)
     66         out = self.act(h)
---> 67         out = self.l_conv(h)
     68         out = torch.tanh(out)
     69 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py in forward(self, input)
    343 
    344     def forward(self, input):
--> 345         return self.conv2d_forward(input, self.weight)
    346 
    347 class Conv3d(_ConvNd):

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py in conv2d_forward(self, input, weight)
    340                             _pair(0), self.dilation, self.groups)
    341         return F.conv2d(input, weight, self.bias, self.stride,
--> 342                         self.padding, self.dilation, self.groups)
    343 
    344     def forward(self, input):

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

This line works perfectly when I run it outside my training loop, so I am a bit confused about what is wrong. Thanks in advance.

Hi,

There are two possible reasons for this kind of error to be raised: 1. there was a bug in CuDNN which was resolved in later versions (so update CuDNN version if possible) 2. input to self.l_conv might be too large for computation so check the batch size and input dimension from the previous layer. Also, try to run your script on the CPU.

Thanks for repying, I am working on CoLab I don’t know if it is possible to install some updates there related with cuDNN, I tried reducing the batch size to 16 and the problem is still there. I am gonna try with the CPU now.

Could you post a code snippet to reproduce this issue, please?
We would need the convolution setup (in_channels, out_channels) as well as the shape of the input you are passing to this conv to debug this issue and check, if this bug was already fixed.

Sure, the class is the following:

class Generator(nn.Module):
    def __init__(self,n_gpu):
        super(Generator,self).__init__()
        self.n_gpu = n_gpu
        self.G_ch_mul = G_ch_mul  #Channel width multiplier
        self.resolution = resolution    # Resolution of the OUTPUT
        self.G_depth = G_depth          # Number of resblocks per stage
        self.arch = G_arch(self.G_ch_mul)[resolution]
        self.G_init = G_init

        self.linear = nn.Linear(noise_dim,self.arch['in_channels'][0]*(4**2))  #(N,noise-dim) > (N,64*16*4*4)

        self.blocks = []
        for index in range(len(self.arch['out_channels'])):                  #In total we have 5 cycles
          self.blocks += [[GBlock(input_ch=self.arch['in_channels'][index],  #(16,16,8,4,2) * 64
                                  output_ch=self.arch['in_channels'][index] if g_index==0 else self.arch['out_channels'][index], #((16,16),(16,8),(8,4),(4,2),(2,1))*64  (O1,O2)
                                  upsample=(functools.partial(F.interpolate, scale_factor=2) if self.arch['upsample'][index] and g_index == (self.G_depth-1) else None))]
                         for g_index in range(self.G_depth)]

          if self.arch['attention'][self.arch['resolution'][index]]:
            print('Adding attention layer in G at resolution %d' % self.arch['resolution'][index])
            self.blocks[-1] += [Attention(self.arch['out_channels'][index], SNConv2d)]

        self.blocks = nn.ModuleList([nn.ModuleList(block) for block in self.blocks])

        #self.output_layer = nn.Sequential(nn.BatchNorm2d(self.arch['out_channels'][-1]),
        #                                  nn.ReLU(True),
        #                                  nn.Conv2d(self.arch['out_channels'][-1],channels,kernel_size=3,padding=1))

        self.bn = nn.BatchNorm2d(self.arch['out_channels'][-1])
        self.act = nn.ReLU(True)
        self.l_conv = nn.Conv2d(in_channels=self.arch['out_channels'][-1],out_channels=channels,kernel_size=3,padding=1)
     
    def forward(self,z,y):

        h = self.linear(z)   #(N,noise_dim) > (N,64*4*4*16)
        h = h.view(h.size(0),-1,4,4)  #(N,64*4*4*16) > (N,16*64,4,4)
        for index, blocklist in enumerate(self.blocks):
            # Second inner loop in case block has multiple layers
            for block in blocklist:
              h = block(h, y)
        out = self.bn(h)
        out = self.act(h)
        out = self.l_conv(h)
        out = torch.tanh(out)

        return out

I don’t know how useful it is since the same line works well when it is outside my trianing loop, the in_channels=128, out_channels=1, Win=128 and Hin=128

Oh and I had the same issue before but it was when I was using Discriminator.backward(), I could solve it by changing some lines in my code, I was using the following structure:

        h = self.conv1(self.activation(self.bn1(z,y)))   

I separated the code in different lines and the error message dissappeared but I raised the current one.