cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Hi,I got the error
Files already downloaded and verified
start…
Traceback (most recent call last):
File “test.py”, line 218, in
main()
File “test.py”, line 83, in main
fake = G(noise)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/home/lilipan/ldagan/model.py”, line 81, in forward
h = self.block2(h)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/home/lilipan/ldagan/model.py”, line 54, in forward
return self.residual(input) + self.shortcut(input)
File “/home/lilipan/ldagan/model.py”, line 38, in residual
h = self.b1(h)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/batchnorm.py”, line 76, in forward
exponential_average_factor, self.eps)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py”, line 1623, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

class genBlock(nn.Module):
def init(self, in_channels, out_channels,
activation=F.relu, hidden_channels=None, ksize=3, pad=1, upsample=False, n_classes=0):
super(genBlock, self).init()
self.activation = activation
self.upsample = upsample
self.learnable_sc = in_channels != out_channels or upsample
hidden_channels = out_channels if hidden_channels is None else hidden_channels
self.n_classes = n_classes
self.c1 = nn.Conv2d(in_channels, hidden_channels, kernel_size=ksize, padding=pad)
#nn.init.xavier_uniform_(self.c1.weight.data, math.sqrt(2))
self.c2 = nn.Conv2d(hidden_channels, out_channels, kernel_size=ksize, padding=pad)
#nn.init.xavier_uniform_(self.c2.weight.data, math.sqrt(2))
self.b1 = nn.BatchNorm2d(in_channels)
self.b2 = nn.BatchNorm2d(hidden_channels)
if self.learnable_sc:
self.c_sc = nn.Conv2d(in_channels, out_channels, kernel_size=ksize, padding=pad)

It shows that there is some problem in self.b1. I deleted self.b1and it worked.
Graphic card: 2080TI pytorch 1.0.0

Interestingly, it also occurs at self.b1, when i change to pytorch 0.4.1. But it shows:
RuntimeError: cuDNN error: CUDNN_STATUS_SUCCESS

And I also ran the same code in another machine, and it worked well.
Graphic card: 1080TI pytorch 0.4.1

Hi,

This is most certainly a CUDA/cudnn/RTX issue: you need to make sure to have CUDA10 and a compatible cudnn to be able to run on RTX cards.
If in that setting you still see an error, could you give a minimal code sample to reproduce the issue please?

import torch
import torch.nn as nn
import torch.nn.functional as F
import math
import numpy as np
class genBlock(nn.Module):
    def __init__(self, in_channels, out_channels,
                 activation=F.relu, hidden_channels=None, ksize=3, pad=1, upsample=False, n_classes=0):
        super(genBlock, self).__init__()
        self.activation = activation
        self.upsample = upsample
        self.learnable_sc = in_channels != out_channels or upsample
        hidden_channels = out_channels if hidden_channels is None else hidden_channels
        self.n_classes = n_classes
        self.c1 = nn.Conv2d(in_channels, hidden_channels, kernel_size=ksize, padding=pad)
        #nn.init.xavier_uniform_(self.c1.weight.data, math.sqrt(2))
        self.c2 = nn.Conv2d(hidden_channels, out_channels, kernel_size=ksize, padding=pad)
        #nn.init.xavier_uniform_(self.c2.weight.data, math.sqrt(2))
        self.b1 = nn.BatchNorm2d(in_channels)
        self.b2 = nn.BatchNorm2d(hidden_channels)
        if self.learnable_sc:
            self.c_sc = nn.Conv2d(in_channels, out_channels, kernel_size=ksize, padding=pad)
    def residual(self, x):
        h = x
        h = self.b1(h)
        h = self.activation(h)
        h = upsample_conv(h, self.c1) if self.upsample else self.c1(h)
        h = self.b2(h)
        h = self.activation(h)
        h = self.c2(h)
        return h

    def shortcut(self, x):
        if self.learnable_sc:
            x = upsample_conv(x, self.c_sc) if self.upsample else self.c_sc(x)
            return x
        else:
            return x

    def forward(self, input):
        return self.residual(input) + self.shortcut(input)
if __name__== "__main__":

    noise = torch.randn(1,256, 4, 4).cuda()
    g = genBlock(256, 256, activation=F.relu, upsample=True).cuda()
    #g.apply(weights_init)
    out = g(noise)
    print(out.shape)

Thank you. I am sure that I have CUDA10 and cudnn.
I install these two thing
cudnn-10.0-linux-x64-v7.4.1.5.tgz
cuda_10.0.130_410.48_linux.run

I’ve successfully tested the cudnn_sample

By the way, there is no error when I run on CPU

I had to set upsample=False because otherwise upsample_conv is not defined.
I can run this properly on Titan Black and X with cuda 8.0.
Unfortunately I don’t have a RTX2080Ti card :sob: so I can’t check on that…

@ngimel maybe will be able to reproduce this with the same setup?

@sumching when I try to run your script i get NameError: name 'upsample_conv' is not defined, can you pleasy copy paste how it’s defined? With upsample=False it runs properly on 2080.

@ngimel

def _upsample(x):
    h, w = x.shape[2:]
    return F.upsample_bilinear(x, size=(h * 2, w * 2))


def upsample_conv(x, conv):
    return conv(_upsample(x))

The problem seems to be BatchNorm. It can work when I delete self.b1. And I did not find that self.b2 has a problem. In this code, If I put the first BatchNorm in this order, it could not run.
When I set upsample=False, and it raises the same error.
Can you please tell me how to check if my CUDA and cudnn installation is correct? I installed servral CUDA versions, and I intalled CUDA-10.0 in path /usr/local/CUDA-10.0. I added the path to the environment variables.Thank you.

Just confirming that I’m having the exact same issue on RTX 2070 (CUDA 10.0, Pytorch 1.0.0) with my first BatchNorm2d layer in my GAN generator. Looks roughly like this:

# bs=64, latent_sz=128, init_sz=8
z = torch.randn(bs, latent_sz).cuda()
generator(z) # generator = Generator().cuda()

# In generator forward(self, x):
x = self.fc(x).view(bs, 128, 8, 8) # fc is nn.Linear(latent_sz, hidden_sz * init_sz**2)
x = self.bn1(x)      # see nn.Sequential() below for init. Real code puts x through the Sequential
...

If I remove this BatchNorm2d (self.bn1 in the example) from the nn.Sequential it works. The nn.Sequential looks as follows:

# hidden_sz=128, init_sz=8
nn.Sequential(
            nn.BatchNorm2d(hidden_sz),
            nn.Upsample(scale_factor=2),
            nn.Conv2d(hidden_sz, hidden_sz, 3, stride=1, padding=1),
            nn.BatchNorm2d(hidden_sz, 0.8),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Upsample(scale_factor=2),
            nn.Conv2d(hidden_sz, hidden_sz / 2, 3, stride=1, padding=1),
            nn.BatchNorm2d(hidden_sz / 2, 0.8),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(hidden_sz / 2, nc, 3, stride=1, padding=1),
            nn.Tanh()
        )
# All divisions hidden_sz / x  is actually hidden_sz // x, 
# I changed it here so it is not displayed as comment in the forum

I have the same error on a 2080ti, batch norm failing and giving an error

I have the same issue with 2080 ti. The model runs fine on another machine with 1080ti and a similar environment. I have:
torch: 1.0.1.post2
cuda: 10.0.130
cudnn: 7402

The weird thing is that my other models all work with this setup even on the 2080ti machine.

It seems to be that if I increase the size of my model it happens. This fails for me:

device = torch.device("cuda:0")

# This model doesn't throw the error
# t = nn.Sequential(nn.Conv3d(3,128, (9,4,4), stride=(1,2,2), padding=(0,1,1)), nn.ReLU(True), nn.Conv3d(128,256, (1,4,4), stride=(1,2,2), padding=(0,1,1)), nn.ReLU(),nn.Conv3d(256,256, (1,4,4), stride=(1,2,2), padding=(0,1,1))).to(device)

t = nn.Sequential(nn.Conv3d(3,256, (9,4,4), stride=(1,2,2), padding=(0,1,1)), nn.ReLU(True), nn.Conv3d(256,512, (1,4,4), stride=(1,2,2), padding=(0,1,1)), nn.ReLU(),nn.Conv3d(512,512, (1,4,4), stride=(1,2,2), padding=(0,1,1))).to(device)
i = torch.ones([10, 3, 9, 64, 96]).to(device)
o = t(i)
criterion = nn.L1Loss()
loss = criterion(o, torch.ones_like(o))
loss.backward()
1 Like

OK finally got it working seems to be some incompatibility in cudatoolkit, cudnn and pytorch. In the end I got it working by compiling pytorch from source. I went for Cuda toolkit 10.0 cudnn 7.5 and pytorch master (latest greatest yay!)

well, I still have this problem with BatchNorm1D on CUDNN anyone has solution for that?

Could you post the error message, the batchnorm setup, input shapes, PyTorch, CUDA and cudnn versions, so that we could debug it please?