Hi Everyone ,
My pytorch is version 0.2 compiled with cudnn6.0.21. When there is batchnorm in the neural net, batch_size can’t exceed 140000, otherwise it says:
"RuntimeError: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input."
Without using batchnorm, batch_size can be very large unless breaking GPU memory. Does this mean that batchnorm doesn’t work with too large batch_size? Or is it a technical bug?
hi Xiao,
it seems like a technical bug that I can fix.
Is there a small script you can provide to reproduce this?
Hi smth,
I have similar question, the batch_size can’t exceed 140000, will cause the same error as shown by Xiao.
PyTorch:0.2.0_3, CUDNN VERSION:6021
If set torch.backends.cudnn.enabled=False
, there is no error.
if set nn.BatchNorm1d(1, affine=False)
, no error.
import torch
import torch.nn as nn
torch.backends.cudnn.enabled=True
x = Variable( torch.rand(140000,1).contiguous()).cuda()
print (torch.backends.cudnn.version())
bn = nn.BatchNorm1d(1)
bn.cuda()
xbn = bn(x)
xbn.size()
BatchNorm1d(1, eps=1e-05, momentum=0.1, affine=True)
-----------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-55-d78b6a19b222> in <module>()
10 bn.cuda()
11
---> 12 xbn = bn(x)
13 xbn.size()
/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
222 for hook in self._forward_pre_hooks.values():
223 hook(self, input)
--> 224 result = self.forward(*input, **kwargs)
225 for hook in self._forward_hooks.values():
226 hook_result = hook(self, input, result)
/usr/local/lib/python2.7/dist-packages/torch/nn/modules/batchnorm.pyc in forward(self, input)
35 return F.batch_norm(
36 input, self.running_mean, self.running_var, self.weight, self.bias,
---> 37 self.training, self.momentum, self.eps)
38
39 def __repr__(self):
/usr/local/lib/python2.7/dist-packages/torch/nn/functional.pyc in batch_norm(input, running_mean, running_var, weight, bias, training, momentum, eps)
637 training=False, momentum=0.1, eps=1e-5):
638 f = torch._C._functions.BatchNorm(running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled)
--> 639 return f(input, weight, bias)
640
641
RuntimeError: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
2 Likes
Thank you, i’ve sent a fix in https://github.com/pytorch/pytorch/pull/2919
WIll be part of next release.