BatchNorm1d failed on GPU

hi, I met a issue:

# env
cuda-8.0-cudnn-7
python 2.7/3.5
torch-2.0

here:

a = Variable(torch.randn(2,5).cuda(), requires_grad=True)
y = torch.nn.BatchNorm1d(5)(a)

## info
----> 1 y = torch.nn.BatchNorm1d(5)(a)
~/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
    222         for hook in self._forward_pre_hooks.values():
    223             hook(self, input)
--> 224         result = self.forward(*input, **kwargs)
    225         for hook in self._forward_hooks.values():
    226             hook_result = hook(self, input, result)

~/anaconda2/lib/python2.7/site-packages/torch/nn/modules/batchnorm.pyc in forward(self, input)
     35         return F.batch_norm(
     36             input, self.running_mean, self.running_var, self.weight, self.bias,
---> 37             self.training, self.momentum, self.eps)
     38
     39     def __repr__(self):

~/anaconda2/lib/python2.7/site-packages/torch/nn/functional.pyc in batch_norm(input, running_mean, running_var, weight, bias, training, momentum, eps)
    637                training=False, momentum=0.1, eps=1e-5):
    638     f = torch._C._functions.BatchNorm(running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled)
--> 639     return f(input, weight, bias)
    640
    641

RuntimeError: std::bad_cast

But works well on cpu.
anyone met this?

Answered here: https://github.com/pytorch/pytorch/issues/3936

solved,
great thank you