cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Sumching · December 12, 2018, 12:16pm

Hi,I got the error
Files already downloaded and verified
start…
Traceback (most recent call last):
File “test.py”, line 218, in
main()
File “test.py”, line 83, in main
fake = G(noise)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/home/lilipan/ldagan/model.py”, line 81, in forward
h = self.block2(h)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/home/lilipan/ldagan/model.py”, line 54, in forward
return self.residual(input) + self.shortcut(input)
File “/home/lilipan/ldagan/model.py”, line 38, in residual
h = self.b1(h)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/batchnorm.py”, line 76, in forward
exponential_average_factor, self.eps)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py”, line 1623, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

class genBlock(nn.Module):
def init(self, in_channels, out_channels,
activation=F.relu, hidden_channels=None, ksize=3, pad=1, upsample=False, n_classes=0):
super(genBlock, self).init()
self.activation = activation
self.upsample = upsample
self.learnable_sc = in_channels != out_channels or upsample
hidden_channels = out_channels if hidden_channels is None else hidden_channels
self.n_classes = n_classes
self.c1 = nn.Conv2d(in_channels, hidden_channels, kernel_size=ksize, padding=pad)
#nn.init.xavier_uniform_(self.c1.weight.data, math.sqrt(2))
self.c2 = nn.Conv2d(hidden_channels, out_channels, kernel_size=ksize, padding=pad)
#nn.init.xavier_uniform_(self.c2.weight.data, math.sqrt(2))
self.b1 = nn.BatchNorm2d(in_channels)
self.b2 = nn.BatchNorm2d(hidden_channels)
if self.learnable_sc:
self.c_sc = nn.Conv2d(in_channels, out_channels, kernel_size=ksize, padding=pad)

It shows that there is some problem in self.b1. I deleted self.b1and it worked.
Graphic card: 2080TI pytorch 1.0.0

Interestingly, it also occurs at self.b1, when i change to pytorch 0.4.1. But it shows:
RuntimeError: cuDNN error: CUDNN_STATUS_SUCCESS

And I also ran the same code in another machine, and it worked well.
Graphic card: 1080TI pytorch 0.4.1

albanD · December 12, 2018, 12:25pm

Hi,

This is most certainly a CUDA/cudnn/RTX issue: you need to make sure to have CUDA10 and a compatible cudnn to be able to run on RTX cards.
If in that setting you still see an error, could you give a minimal code sample to reproduce the issue please?

Sumching · December 12, 2018, 1:08pm

import torch
import torch.nn as nn
import torch.nn.functional as F
import math
import numpy as np
class genBlock(nn.Module):
    def __init__(self, in_channels, out_channels,
                 activation=F.relu, hidden_channels=None, ksize=3, pad=1, upsample=False, n_classes=0):
        super(genBlock, self).__init__()
        self.activation = activation
        self.upsample = upsample
        self.learnable_sc = in_channels != out_channels or upsample
        hidden_channels = out_channels if hidden_channels is None else hidden_channels
        self.n_classes = n_classes
        self.c1 = nn.Conv2d(in_channels, hidden_channels, kernel_size=ksize, padding=pad)
        #nn.init.xavier_uniform_(self.c1.weight.data, math.sqrt(2))
        self.c2 = nn.Conv2d(hidden_channels, out_channels, kernel_size=ksize, padding=pad)
        #nn.init.xavier_uniform_(self.c2.weight.data, math.sqrt(2))
        self.b1 = nn.BatchNorm2d(in_channels)
        self.b2 = nn.BatchNorm2d(hidden_channels)
        if self.learnable_sc:
            self.c_sc = nn.Conv2d(in_channels, out_channels, kernel_size=ksize, padding=pad)
    def residual(self, x):
        h = x
        h = self.b1(h)
        h = self.activation(h)
        h = upsample_conv(h, self.c1) if self.upsample else self.c1(h)
        h = self.b2(h)
        h = self.activation(h)
        h = self.c2(h)
        return h

    def shortcut(self, x):
        if self.learnable_sc:
            x = upsample_conv(x, self.c_sc) if self.upsample else self.c_sc(x)
            return x
        else:
            return x

    def forward(self, input):
        return self.residual(input) + self.shortcut(input)
if __name__== "__main__":

    noise = torch.randn(1,256, 4, 4).cuda()
    g = genBlock(256, 256, activation=F.relu, upsample=True).cuda()
    #g.apply(weights_init)
    out = g(noise)
    print(out.shape)

Thank you. I am sure that I have CUDA10 and cudnn.
I install these two thing
cudnn-10.0-linux-x64-v7.4.1.5.tgz
cuda_10.0.130_410.48_linux.run

I’ve successfully tested the cudnn_sample

By the way, there is no error when I run on CPU

albanD · December 12, 2018, 1:44pm

I had to set upsample=False because otherwise upsample_conv is not defined.
I can run this properly on Titan Black and X with cuda 8.0.
Unfortunately I don’t have a RTX2080Ti card so I can’t check on that…

@ngimel maybe will be able to reproduce this with the same setup?

ngimel · December 12, 2018, 6:24pm

@sumching when I try to run your script i get NameError: name 'upsample_conv' is not defined, can you pleasy copy paste how it’s defined? With upsample=False it runs properly on 2080.

Sumching · December 13, 2018, 4:19am

@ngimel

def _upsample(x):
    h, w = x.shape[2:]
    return F.upsample_bilinear(x, size=(h * 2, w * 2))


def upsample_conv(x, conv):
    return conv(_upsample(x))

The problem seems to be BatchNorm. It can work when I delete self.b1. And I did not find that self.b2 has a problem. In this code, If I put the first BatchNorm in this order, it could not run.
When I set upsample=False, and it raises the same error.
Can you please tell me how to check if my CUDA and cudnn installation is correct? I installed servral CUDA versions, and I intalled CUDA-10.0 in path /usr/local/CUDA-10.0. I added the path to the environment variables.Thank you.

xeTaiz · January 18, 2019, 2:25pm

Just confirming that I’m having the exact same issue on RTX 2070 (CUDA 10.0, Pytorch 1.0.0) with my first BatchNorm2d layer in my GAN generator. Looks roughly like this:

# bs=64, latent_sz=128, init_sz=8
z = torch.randn(bs, latent_sz).cuda()
generator(z) # generator = Generator().cuda()

# In generator forward(self, x):
x = self.fc(x).view(bs, 128, 8, 8) # fc is nn.Linear(latent_sz, hidden_sz * init_sz**2)
x = self.bn1(x)      # see nn.Sequential() below for init. Real code puts x through the Sequential
...

If I remove this BatchNorm2d (self.bn1 in the example) from the nn.Sequential it works. The nn.Sequential looks as follows:

# hidden_sz=128, init_sz=8
nn.Sequential(
            nn.BatchNorm2d(hidden_sz),
            nn.Upsample(scale_factor=2),
            nn.Conv2d(hidden_sz, hidden_sz, 3, stride=1, padding=1),
            nn.BatchNorm2d(hidden_sz, 0.8),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Upsample(scale_factor=2),
            nn.Conv2d(hidden_sz, hidden_sz / 2, 3, stride=1, padding=1),
            nn.BatchNorm2d(hidden_sz / 2, 0.8),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(hidden_sz / 2, nc, 3, stride=1, padding=1),
            nn.Tanh()
        )
# All divisions hidden_sz / x  is actually hidden_sz // x, 
# I changed it here so it is not displayed as comment in the forum

pkadambi · March 20, 2019, 5:31pm

I have the same error on a 2080ti, batch norm failing and giving an error

Sfitsos · March 26, 2019, 5:50pm

I have the same issue with 2080 ti. The model runs fine on another machine with 1080ti and a similar environment. I have:
torch: 1.0.1.post2
cuda: 10.0.130
cudnn: 7402

The weird thing is that my other models all work with this setup even on the 2080ti machine.

Sfitsos · March 26, 2019, 8:01pm

It seems to be that if I increase the size of my model it happens. This fails for me:

device = torch.device("cuda:0")

# This model doesn't throw the error
# t = nn.Sequential(nn.Conv3d(3,128, (9,4,4), stride=(1,2,2), padding=(0,1,1)), nn.ReLU(True), nn.Conv3d(128,256, (1,4,4), stride=(1,2,2), padding=(0,1,1)), nn.ReLU(),nn.Conv3d(256,256, (1,4,4), stride=(1,2,2), padding=(0,1,1))).to(device)

t = nn.Sequential(nn.Conv3d(3,256, (9,4,4), stride=(1,2,2), padding=(0,1,1)), nn.ReLU(True), nn.Conv3d(256,512, (1,4,4), stride=(1,2,2), padding=(0,1,1)), nn.ReLU(),nn.Conv3d(512,512, (1,4,4), stride=(1,2,2), padding=(0,1,1))).to(device)
i = torch.ones([10, 3, 9, 64, 96]).to(device)
o = t(i)
criterion = nn.L1Loss()
loss = criterion(o, torch.ones_like(o))
loss.backward()

Sfitsos · March 26, 2019, 11:34pm

OK finally got it working seems to be some incompatibility in cudatoolkit, cudnn and pytorch. In the end I got it working by compiling pytorch from source. I went for Cuda toolkit 10.0 cudnn 7.5 and pytorch master (latest greatest yay!)

ryuxin · August 6, 2020, 5:19am

well, I still have this problem with BatchNorm1D on CUDNN anyone has solution for that?

ptrblck · August 8, 2020, 10:02am

Could you post the error message, the batchnorm setup, input shapes, PyTorch, CUDA and cudnn versions, so that we could debug it please?