[resolved] BatchNorm1d -> CUDNN_STATUS_NOT_SUPPORTED

pchalasani · May 16, 2017, 2:29am

Here’s a minimal example (never mind that it looks strange):

import torch.nn as nn
x = Variable( t.rand(10000000,1)).cuda()
bn = nn.BatchNorm1d(1)
bn.cuda()
xbn = bn(x)

I get the stack-trace below. I recall from some other thread that I would need to build PyTorch from branch R4 to get rid of this? Is that still the case? I’m using PyTorch built from master a month ago on the AWS Deep Learning Ubuntu AMI instance, and I get this error on that instance.

Thanks !

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-25-6584c2ec408e> in <module>()
      6 bn = nn.BatchNorm1d(1)
      7 bn.cuda()
----> 8 x1cbn = bn(x)
      9 x1cbn.size()

/home/ubuntu/src/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
    200 
    201     def __call__(self, *input, **kwargs):
--> 202         result = self.forward(*input, **kwargs)
    203         for hook in self._forward_hooks.values():
    204             hook_result = hook(self, input, result)

/home/ubuntu/src/anaconda2/lib/python2.7/site-packages/torch/nn/modules/batchnorm.pyc in forward(self, input)
     41         return F.batch_norm(
     42             input, self.running_mean, self.running_var, self.weight, self.bias,
---> 43             self.training, self.momentum, self.eps)
     44 
     45     def __repr__(self):

/home/ubuntu/src/anaconda2/lib/python2.7/site-packages/torch/nn/functional.pyc in batch_norm(input, running_mean, running_var, weight, bias, training, momentum, eps)
    387                training=False, momentum=0.1, eps=1e-5):
    388     f = torch._C._functions.BatchNorm(running_mean, running_var, training, momentum, eps)
--> 389     return f(input, weight, bias)
    390 
    391 

RuntimeError: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

pchalasani · May 17, 2017, 1:13am

This is with Cuda 8.0, cudnn 6, Ubuntu 14.04 (the Amazon Deep Learning AMI), with PyTorch installed from source a couple months ago.

And incidentally it works fine if I use a smaller tensor size, like say 100000 instead of 10000000

albanD · May 17, 2017, 8:38am

Hi,
You may want to wait for a response from an nvidia guy, but from what I remember, some very weird input shapes are not supported by cudnn (for various reasons like for example your gpu not having enough memory for the required workspace). Given that in your case feeding a smaller tensor works, it may be the reason.

You can try disabling cudnn with torch.backends.cudnn.enabled=False and see if it works.

pchalasani · May 17, 2017, 2:09pm

Thanks @albanD I did this and it did not fix it:

import torch
import torch.nn as nn

torch.backends.cudnn.enabled=False
x = Variable( t.rand(1000000,1).contiguous()).cuda()

print torch.backends.cudnn.version()

bn = nn.BatchNorm1d(1)
bn.cuda()

xbn = bn(x)
xbn.size()

albanD · May 17, 2017, 4:12pm

Hi, I cannot reproduce your problem. Running the code sample that you gave does not raise any cudnn error anymore for me.

pchalasani · May 17, 2017, 4:13pm

did you try with a larger size like 10 Million ?

albanD · May 17, 2017, 4:16pm

Yes, this exact code works for me (after freezing my computer for few second due to memory usage) and outputs:
6021
(100000000L, 1L)

import torch
from torch.autograd import Variable
import torch.nn as nn

torch.backends.cudnn.enabled=False
x = Variable( torch.rand(100000000,1).contiguous()).cuda()

print torch.backends.cudnn.version()

bn = nn.BatchNorm1d(1)
bn.cuda()

xbn = bn(x)
print(xbn.size())

pchalasani · May 17, 2017, 4:19pm

Ah my version shows 5110, so looks like I’m still on cudnn 5, although I thought I installed 6.0

pchalasani · May 17, 2017, 4:24pm

Turned out I had forgotten to re-build PyTorch from source after installing the cudnn 6.0 files. Now it works fine.