[Possible Bug Report] nn.BatchNorm2d produces wrong results silently with permuted input tensor (not continguous)

Recently i came across the problem that my reimplemented batchnorm2d produces different results with the nn.BatchNorm2d.
After days of debugging, I found out that when permute() is performed on input tensor without continguous(), and the last dimension of the input tensor is 1. (e.g., [N,C,W,1])
The BatchNorm2d(only the BatchNorm2d on cuda, BatchNorm1d and on CPU work well) produces wrong results without warnings.

The Test code

import torch
import torch.nn as nn

x_ = torch.rand([3,800,32], device='cuda:0')  # [N,W,C]

x = x_.permute([0,2,1]).unsqueeze(-1)  # [N,C,W,1]
print(x.shape)  # [N,C,W,1]
C = x.shape[1]

bn0 = nn.BatchNorm2d(C,momentum=1,eps=1e-3,affine=False,device=x.device) # set the BN momentum as 1., so the running_mean/var is the mean/var cur batch
mean_ = x.mean(dim=(0,2,3),keepdim=True)
var_ = ((x - mean_)**2).mean(dim=(0,2,3),keepdim=True)
x_hat = (x-mean_) / torch.sqrt(var_+bn0.eps)
x_hat_2 = bn0(x)

print('on CUDA (No Contiguous)')
print((mean_.reshape([C])-bn0.running_mean).abs().max())
print((var_.reshape([C])-bn0.running_var).abs().max())
print((x_hat - x_hat_2).abs().max())

# =============================================================

x = x.contiguous()
print(x.shape)  # [N,C,W,1]
C = x.shape[1]

bn0 = nn.BatchNorm2d(C,momentum=1,eps=1e-3,affine=False,device=x.device) # set the BN momentum as 1., so the running_mean/var is the mean/var cur batch
mean_ = x.mean(dim=(0,2,3),keepdim=True)
var_ = ((x - mean_)**2).mean(dim=(0,2,3),keepdim=True)
x_hat = (x-mean_) / torch.sqrt(var_+bn0.eps)
x_hat_2 = bn0(x)

print('on CUDA (Contiguous)')
print((mean_.reshape([C])-bn0.running_mean).abs().max())
print((var_.reshape([C])-bn0.running_var).abs().max())
print((x_hat - x_hat_2).abs().max())

and the output is:

torch.Size([3, 32, 800, 1])
on CUDA (No Contiguous)
tensor(2.9802e-07, device='cuda:0')
tensor(3.5875e-05, device='cuda:0')
tensor(3.4871, device='cuda:0')
torch.Size([3, 32, 800, 1])
on CUDA (Contiguous)
tensor(2.0862e-07, device='cuda:0')
tensor(3.5882e-05, device='cuda:0')
tensor(9.5367e-07, device='cuda:0')

As could be seen, both the hand-calculated means and variances are nearly the same, for bn without affine transform, the final results should be the same. but the former setting produces wrong results.

My Environment is:

Collecting environment information...
PyTorch version: 1.9.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.9 (64-bit runtime)
Python platform: Linux-5.4.0-124-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.1.74
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090
GPU 2: NVIDIA GeForce RTX 3090
GPU 3: NVIDIA GeForce RTX 3090
GPU 4: NVIDIA GeForce RTX 3090
GPU 5: NVIDIA GeForce RTX 3090
GPU 6: NVIDIA GeForce RTX 3090
GPU 7: NVIDIA GeForce RTX 3090

Nvidia driver version: 510.54
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.3
[pip3] numpydoc==1.1.0
[pip3] torch==1.9.0+cu111
[pip3] torchaudio==0.9.0
[pip3] torchvision==0.10.0+cu111
[conda] blas                      1.0                         mkl    defaults
[conda] mkl                       2021.4.0           h06a4308_640    defaults
[conda] mkl-service               2.4.0            py39h7f8727e_0    defaults
[conda] mkl_fft                   1.3.1            py39hd3c417c_0    defaults
[conda] mkl_random                1.2.2            py39h51133e4_0    defaults
[conda] numpy                     1.20.3           py39hf144106_0    defaults
[conda] numpy-base                1.20.3           py39h74d4b33_0    defaults
[conda] numpydoc                  1.1.0              pyhd3eb1b0_1    defaults
[conda] torch                     1.9.0+cu111              pypi_0    pypi
[conda] torchaudio                0.9.0                    pypi_0    pypi
[conda] torchvision               0.10.0+cu111             pypi_0    pypi

Thanks for raising this issue and for posting the minimal code snippet (it helps a lot in debugging)!

I cannot reproduce the issue in the last stable release (1.12.1) and I guess it was a known and already fixed issue between 1.9 and 1.12.
My output:

torch.Size([3, 32, 800, 1])
on CUDA (No Contiguous)
tensor(2.9802e-07, device='cuda:0')
tensor(3.5658e-05, device='cuda:0')
tensor(1.1921e-06, device='cuda:0')
torch.Size([3, 32, 800, 1])
on CUDA (Contiguous)
tensor(1.7881e-07, device='cuda:0')
tensor(3.5658e-05, device='cuda:0')
tensor(8.3447e-07, device='cuda:0')

Could you update your PyTorch install and rerun your code?

Thank you for replying. I update my code to 1.12.1 and the bug disappears. Happy to know that this issue is fixed in current version.

1 Like

Thanks for verifying it!