Why am I getting different results for batchnorm2d than expected, manual vs pytorch?

For some reason, even if I am following the formula in pytorch docs, it seems I am getting different results when trying to simulate batchnorm2d by hand vs pytorch.

Here is the manual version

>>> x = t.tensor([[[[1., 2.],
...                 [3., 4.]],
...                [[3., 4.],
...                 [5., 6.]]],
...               [[[1., 2.],
...                 [30., 40.]],
...                [[15., 4.],
...                 [5., 6.]]
...                ]])
>>> 
>>> x.shape
torch.Size([2, 2, 2, 2])
>>> x_mean = x.mean()
>>> x_var = x.var()
>>> eps = 1e-5
>>> print(x_mean)
tensor(8.1875)
>>> print(x_var)
tensor(123.3625)
>>> print((x - x_mean) / (x_var + eps) ** .5)
tensor([[[[-0.6471, -0.5571],
          [-0.4671, -0.3770]],

         [[-0.4671, -0.3770],
          [-0.2870, -0.1970]]],


        [[[-0.6471, -0.5571],
          [ 1.9639,  2.8642]],

         [[ 0.6134, -0.3770],
          [-0.2870, -0.1970]]]])

Then here is the pytorch version

>>> m = nn.BatchNorm2d(2, affine=False)
>>> print(m(x))
tensor([[[[-0.6481, -0.5790],
          [-0.5099, -0.4407]],

         [[-0.8485, -0.5657],
          [-0.2828,  0.0000]]],


        [[[-0.6481, -0.5790],
          [ 1.3567,  2.0481]],

         [[ 2.5456, -0.5657],
          [-0.2828,  0.0000]]]])

I made sure to put affine=False to leave out the gamma and beta learnable params

Have a look at this manual implementation.

Skimming through your code it looks like the variance calculation is unbiased in your case.

1 Like

So yes, your implementation was helpful for finding the two bugs. First is the Bessel’s correction(i.e. diving by n-1 in the variance formula), due to unbiased=True.

Second is the dimensions I am getting the mean and variance from. It should have been

x.mean([0, 2, 3])

and

x.var([0, 2, 3], unbiased=False)

1 Like