CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input

davidlee · February 6, 2020, 6:59am

Hi all,
Im training CNN and error message like below comes up
"CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input."
How can I resolve it?
Is there any way that I can fix it by not using data loader?

Total Input : (117000, 51, 51), where batch size is 32.

Dataloader

trainloader = DataLoader(dataset=train_data, batch_size=32, num_workers=0, shuffle=True, drop_last=True)

Thank you very much in advance !!

ptrblck · February 6, 2020, 7:05am

Could you post the model definition so that we can reproduce this error?

davidlee · February 6, 2020, 7:07am

Hi ptrblck, Please refer to the below.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.cnn = nn.Sequential(

        # (51,51)
        torch.nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1, bias=True),
        torch.nn.BatchNorm2d(64),    
        torch.nn.ReLU(),
        torch.nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1, bias=True),
        torch.nn.BatchNorm2d(64),    
        torch.nn.ReLU(),
        torch.nn.MaxPool2d(kernel_size=2, stride=2), # (25,25)

        torch.nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1, bias=True),
        torch.nn.BatchNorm2d(128),    
        torch.nn.ReLU(),
        torch.nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1, bias=True),
        torch.nn.BatchNorm2d(128),    
        torch.nn.ReLU(),
        torch.nn.MaxPool2d(kernel_size=2, stride=2), # (12,12)

        torch.nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1, bias=True),
        torch.nn.BatchNorm2d(256),    
        torch.nn.ReLU(),
        torch.nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1, bias=True),
        torch.nn.BatchNorm2d(256),    
        torch.nn.ReLU(),
        torch.nn.MaxPool2d(kernel_size=2, stride=2), # (6,6)
            
        torch.nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1, bias=True),
        torch.nn.BatchNorm2d(512),    
        torch.nn.ReLU(),
        torch.nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1, bias=True),
        torch.nn.BatchNorm2d(512),    
        torch.nn.ReLU(),
        torch.nn.MaxPool2d(kernel_size=2, stride=2),
            
        )
        # (batch, 128,6,6)
        self.linear = nn.Sequential(
        torch.nn.Linear(512*3*3, 2048),
        torch.nn.ReLU(),
        torch.nn.Linear(2048, 1024),
        torch.nn.ReLU(),
        torch.nn.Linear(1024, 512),
        torch.nn.ReLU(),
        torch.nn.Linear(512, 256),
        torch.nn.ReLU(),
        torch.nn.Linear(256,3)
        )
    def forward(self, x):
        out = self.cnn(x)
        out = self.linear(out.view(-1, 128*6*6))
        
        return out

ptrblck · February 6, 2020, 7:10am

Thanks for the code! Which PyTorch, CUDA and cudnn versions are you using?

davidlee · February 6, 2020, 7:11am

Thanks for quick reponse!
I’m using CUDA!

davidlee · February 6, 2020, 7:16am

device = 'cuda:4' if torch.cuda.is_available() else 'cpu'

btw,
If I check torch version(torch.version.cuda), it says ‘9.2.148’
and also it says ‘7603’ if I check with torch.backends.cudnn.version()

davidlee · February 8, 2020, 1:29am

May I get any clues about this problem?

ptrblck · February 9, 2020, 1:13am

I used the following input shape to reproduce this issue:

model = Net().cuda()
x = torch.randn(32, 1, 51, 51).cuda()
output = model(x)

with PyTorch 1.4.0 using the CUDA10.1, CUDA10.0, and CUDA9.2 binaries and all run successful on my device.
Could you post some information about your machine, e.g. which GPU you are using so that we can try to reproduce it again?

davidlee · February 9, 2020, 8:01am

Hi ptrblck,

Thank you very much for your response.
Here is my code link contains this error (https://github.com/leesays92/NN-in-general/blob/master/Error1.ipynb). I would really appreciate it if you could look through it.

I’m currently using GPU for RTX 2080 Ti.
I upgraded to PyTorch 1.4.0 (Which was originally 1.3.1) and it doesn’t work as well.
The funny thing is that… If i reduce the total amount of data (11,7000 to 35,000), it works.

Thanks in advance!

ptrblck · February 9, 2020, 8:16am

Thanks for the code.
It is indeed an out of memory issue.
You are currently using a batch size of the complete validation dataset via:

val_data = ValDataset()
valloader = DataLoader(dataset=val_data, batch_size=len(val_data), num_workers=0, shuffle=False, drop_last=False)

Based on the previous outputs of your notebook, the length of the validation dataset should be 130000 - 117000 = 13000, which will use too much memory.
I’ve tested different shapes and it seems a batch size of ~5000 would be the limit using approx. 44GB of your GPU.

davidlee · February 9, 2020, 9:00am

Oh, i thought it is different from OOM issue.
Thank you very much for your kind help!!