For some reason, even if I am following the formula in pytorch docs, it seems I am getting different results when trying to simulate batchnorm2d by hand vs pytorch.

Here is the manual version

```
>>> x = t.tensor([[[[1., 2.],
... [3., 4.]],
... [[3., 4.],
... [5., 6.]]],
... [[[1., 2.],
... [30., 40.]],
... [[15., 4.],
... [5., 6.]]
... ]])
>>>
>>> x.shape
torch.Size([2, 2, 2, 2])
```

```
>>> x_mean = x.mean()
>>> x_var = x.var()
>>> eps = 1e-5
>>> print(x_mean)
tensor(8.1875)
>>> print(x_var)
tensor(123.3625)
>>> print((x - x_mean) / (x_var + eps) ** .5)
tensor([[[[-0.6471, -0.5571],
[-0.4671, -0.3770]],
[[-0.4671, -0.3770],
[-0.2870, -0.1970]]],
[[[-0.6471, -0.5571],
[ 1.9639, 2.8642]],
[[ 0.6134, -0.3770],
[-0.2870, -0.1970]]]])
```

Then here is the pytorch version

```
>>> m = nn.BatchNorm2d(2, affine=False)
>>> print(m(x))
tensor([[[[-0.6481, -0.5790],
[-0.5099, -0.4407]],
[[-0.8485, -0.5657],
[-0.2828, 0.0000]]],
[[[-0.6481, -0.5790],
[ 1.3567, 2.0481]],
[[ 2.5456, -0.5657],
[-0.2828, 0.0000]]]])
```

I made sure to put affine=False to leave out the gamma and beta learnable params