Recently i came across the problem that my reimplemented batchnorm2d produces different results with the nn.BatchNorm2d
.
After days of debugging, I found out that when permute()
is performed on input tensor without continguous()
, and the last dimension of the input tensor is 1. (e.g., [N,C,W,1]
)
The BatchNorm2d(only the BatchNorm2d on cuda, BatchNorm1d and on CPU work well) produces wrong results without warnings.
The Test code
import torch
import torch.nn as nn
x_ = torch.rand([3,800,32], device='cuda:0') # [N,W,C]
x = x_.permute([0,2,1]).unsqueeze(-1) # [N,C,W,1]
print(x.shape) # [N,C,W,1]
C = x.shape[1]
bn0 = nn.BatchNorm2d(C,momentum=1,eps=1e-3,affine=False,device=x.device) # set the BN momentum as 1., so the running_mean/var is the mean/var cur batch
mean_ = x.mean(dim=(0,2,3),keepdim=True)
var_ = ((x - mean_)**2).mean(dim=(0,2,3),keepdim=True)
x_hat = (x-mean_) / torch.sqrt(var_+bn0.eps)
x_hat_2 = bn0(x)
print('on CUDA (No Contiguous)')
print((mean_.reshape([C])-bn0.running_mean).abs().max())
print((var_.reshape([C])-bn0.running_var).abs().max())
print((x_hat - x_hat_2).abs().max())
# =============================================================
x = x.contiguous()
print(x.shape) # [N,C,W,1]
C = x.shape[1]
bn0 = nn.BatchNorm2d(C,momentum=1,eps=1e-3,affine=False,device=x.device) # set the BN momentum as 1., so the running_mean/var is the mean/var cur batch
mean_ = x.mean(dim=(0,2,3),keepdim=True)
var_ = ((x - mean_)**2).mean(dim=(0,2,3),keepdim=True)
x_hat = (x-mean_) / torch.sqrt(var_+bn0.eps)
x_hat_2 = bn0(x)
print('on CUDA (Contiguous)')
print((mean_.reshape([C])-bn0.running_mean).abs().max())
print((var_.reshape([C])-bn0.running_var).abs().max())
print((x_hat - x_hat_2).abs().max())
and the output is:
torch.Size([3, 32, 800, 1])
on CUDA (No Contiguous)
tensor(2.9802e-07, device='cuda:0')
tensor(3.5875e-05, device='cuda:0')
tensor(3.4871, device='cuda:0')
torch.Size([3, 32, 800, 1])
on CUDA (Contiguous)
tensor(2.0862e-07, device='cuda:0')
tensor(3.5882e-05, device='cuda:0')
tensor(9.5367e-07, device='cuda:0')
As could be seen, both the hand-calculated means and variances are nearly the same, for bn without affine transform, the final results should be the same. but the former setting produces wrong results.
My Environment is:
Collecting environment information...
PyTorch version: 1.9.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.9 (64-bit runtime)
Python platform: Linux-5.4.0-124-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.1.74
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090
GPU 2: NVIDIA GeForce RTX 3090
GPU 3: NVIDIA GeForce RTX 3090
GPU 4: NVIDIA GeForce RTX 3090
GPU 5: NVIDIA GeForce RTX 3090
GPU 6: NVIDIA GeForce RTX 3090
GPU 7: NVIDIA GeForce RTX 3090
Nvidia driver version: 510.54
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.20.3
[pip3] numpydoc==1.1.0
[pip3] torch==1.9.0+cu111
[pip3] torchaudio==0.9.0
[pip3] torchvision==0.10.0+cu111
[conda] blas 1.0 mkl defaults
[conda] mkl 2021.4.0 h06a4308_640 defaults
[conda] mkl-service 2.4.0 py39h7f8727e_0 defaults
[conda] mkl_fft 1.3.1 py39hd3c417c_0 defaults
[conda] mkl_random 1.2.2 py39h51133e4_0 defaults
[conda] numpy 1.20.3 py39hf144106_0 defaults
[conda] numpy-base 1.20.3 py39h74d4b33_0 defaults
[conda] numpydoc 1.1.0 pyhd3eb1b0_1 defaults
[conda] torch 1.9.0+cu111 pypi_0 pypi
[conda] torchaudio 0.9.0 pypi_0 pypi
[conda] torchvision 0.10.0+cu111 pypi_0 pypi
Thanks for raising this issue and for posting the minimal code snippet (it helps a lot in debugging)!
I cannot reproduce the issue in the last stable release (1.12.1
) and I guess it was a known and already fixed issue between 1.9
and 1.12
.
My output:
torch.Size([3, 32, 800, 1])
on CUDA (No Contiguous)
tensor(2.9802e-07, device='cuda:0')
tensor(3.5658e-05, device='cuda:0')
tensor(1.1921e-06, device='cuda:0')
torch.Size([3, 32, 800, 1])
on CUDA (Contiguous)
tensor(1.7881e-07, device='cuda:0')
tensor(3.5658e-05, device='cuda:0')
tensor(8.3447e-07, device='cuda:0')
Could you update your PyTorch install and rerun your code?
Thank you for replying. I update my code to 1.12.1 and the bug disappears. Happy to know that this issue is fixed in current version.
1 Like
Thanks for verifying it!