I implemented BatchNorm2d. But I found that there is difference between the way of computing variance of input. Please see the below code. Only difference between MyWrongBatchNorm and MyBatchNorm is the way of computing variance in forward().

MyWrongBatchNorm computes variance by using tensor.var()

```
class MyWrongBatchNorm(nn.Module):
def __init__(self, num_features, momentum=0.9, epsilon=1e-05):
super(MyWrongBatchNorm, self).__init__()
self.momentum = momentum
self.insize = num_features
self.epsilon = epsilon
# init weight(gamma), bias(beta),running mean, var
self.weight = nn.Parameter(torch.ones(self.insize))
self.bias = nn.Parameter(torch.zeros(self.insize))
self.run_mean = torch.zeros(self.insize)
self.run_var = torch.ones(self.insize)
def forward(self, input, mode):
if mode == 'train':
mean = input.mean([0,2,3]) #mean across dims 0,2,3
mean = mean.view(1, self.insize, 1, 1)
var = input.var([0,2,3]) #var across dims 0,2,3
var = var.view(1, self.insize, 1, 1)
weight = self.weight.view([1, self.insize, 1, 1])
bias = self.bias.view([1, self.insize, 1, 1])
out = weight*(input-mean)/torch.sqrt(var+self.epsilon) + bias
if mode == 'test':
pass # in this question, only consider train mode
return out
```

MyBatchNorm directly computes variance.

```
class MyBatchNorm(nn.Module):
def __init__(self, num_features, momentum=0.9, epsilon=1e-05):
super(MyBatchNorm, self).__init__()
self.momentum = momentum
self.insize = num_features
self.epsilon = epsilon
# init weight(gamma), bias(beta),running mean, var
self.weight = nn.Parameter(torch.ones(self.insize))
self.bias = nn.Parameter(torch.zeros(self.insize))
self.run_mean = torch.zeros(self.insize)
self.run_var = torch.ones(self.insize)
def forward(self, input, mode):
if mode == 'train':
mean = input.mean(dim=(0, 2, 3)).view(1,self.insize, 1, 1) #mean across dims 0,2,3
var = ((input - mean) ** 2).mean(dim=(0, 2, 3)).view(1,self.insize, 1, 1) #var across dims 0,2,3
weight = self.weight.view([1, self.insize, 1, 1])
bias = self.bias.view([1, self.insize, 1, 1])
out = weight*(input-mean)/torch.sqrt(var+self.epsilon) + bias
if mode == 'test':
pass # in this question, only consider train mode
return out
```

Then run the codes below:

```
x = torch.randn(100,3,32,32)
answer = nn.BatchNorm2d(x.shape[1])(x)
mybatchnorm = MyBatchNorm(num_features=x.shape[1])
y1 = mybatchnorm(x, mode='train')
print('Right result: ',torch.norm(y1-answer, p=2))
mywrongbatchnorm = MyWrongBatchNorm(num_features=x.shape[1])
y2 = mywrongbatchnorm(x)
print('Wrong result: ',torch.norm(y2-answer, p=2))
```

The results are quite different.

Right result: tensor(2.8985e-05, grad_fn=)

Wrong result: tensor(0.0027, grad_fn=)

Why is this happening? I don’t think there’s anything wrong with it, so it looks very strange.

- I checked that the value of variance are same on MyBatchNorm and MyWrongBatchNorm.

i.e, var1 = input.var([0,2,3]).view(1, self.insize, 1, 1) and var2 = ((input - mean) ** 2).mean(dim=(0, 2, 3)).view(1,self.insize, 1, 1) have same values.