[resolved] BatchNorm1d -> CUDNN_STATUS_NOT_SUPPORTED

Here’s a minimal example (never mind that it looks strange):

import torch.nn as nn
x = Variable( t.rand(10000000,1)).cuda()
bn = nn.BatchNorm1d(1)
xbn = bn(x)

I get the stack-trace below. I recall from some other thread that I would need to build PyTorch from branch R4 to get rid of this? Is that still the case? I’m using PyTorch built from master a month ago on the AWS Deep Learning Ubuntu AMI instance, and I get this error on that instance.

Thanks !

RuntimeError                              Traceback (most recent call last)
<ipython-input-25-6584c2ec408e> in <module>()
      6 bn = nn.BatchNorm1d(1)
      7 bn.cuda()
----> 8 x1cbn = bn(x)
      9 x1cbn.size()

/home/ubuntu/src/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
    201     def __call__(self, *input, **kwargs):
--> 202         result = self.forward(*input, **kwargs)
    203         for hook in self._forward_hooks.values():
    204             hook_result = hook(self, input, result)

/home/ubuntu/src/anaconda2/lib/python2.7/site-packages/torch/nn/modules/batchnorm.pyc in forward(self, input)
     41         return F.batch_norm(
     42             input, self.running_mean, self.running_var, self.weight, self.bias,
---> 43             self.training, self.momentum, self.eps)
     45     def __repr__(self):

/home/ubuntu/src/anaconda2/lib/python2.7/site-packages/torch/nn/functional.pyc in batch_norm(input, running_mean, running_var, weight, bias, training, momentum, eps)
    387                training=False, momentum=0.1, eps=1e-5):
    388     f = torch._C._functions.BatchNorm(running_mean, running_var, training, momentum, eps)
--> 389     return f(input, weight, bias)

RuntimeError: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

This is with Cuda 8.0, cudnn 6, Ubuntu 14.04 (the Amazon Deep Learning AMI), with PyTorch installed from source a couple months ago.

And incidentally it works fine if I use a smaller tensor size, like say 100000 instead of 10000000

You may want to wait for a response from an nvidia guy, but from what I remember, some very weird input shapes are not supported by cudnn (for various reasons like for example your gpu not having enough memory for the required workspace). Given that in your case feeding a smaller tensor works, it may be the reason.

You can try disabling cudnn with torch.backends.cudnn.enabled=False and see if it works.

1 Like

Thanks @albanD I did this and it did not fix it:

import torch
import torch.nn as nn

x = Variable( t.rand(1000000,1).contiguous()).cuda()

print torch.backends.cudnn.version()

bn = nn.BatchNorm1d(1)

xbn = bn(x)

Hi, I cannot reproduce your problem. Running the code sample that you gave does not raise any cudnn error anymore for me.

did you try with a larger size like 10 Million ?

Yes, this exact code works for me (after freezing my computer for few second due to memory usage) and outputs:
(100000000L, 1L)

import torch
from torch.autograd import Variable
import torch.nn as nn

x = Variable( torch.rand(100000000,1).contiguous()).cuda()

print torch.backends.cudnn.version()

bn = nn.BatchNorm1d(1)

xbn = bn(x)

Ah my version shows 5110, so looks like I’m still on cudnn 5, although I thought I installed 6.0

Turned out I had forgotten to re-build PyTorch from source after installing the cudnn 6.0 files. Now it works fine.