# Different result between Tensor.var() vs directly calculating variance

I implemented BatchNorm2d. But I found that there is difference between the way of computing variance of input. Please see the below code. Only difference between MyWrongBatchNorm and MyBatchNorm is the way of computing variance in forward().

MyWrongBatchNorm computes variance by using tensor.var()

``````class MyWrongBatchNorm(nn.Module):
def __init__(self, num_features, momentum=0.9, epsilon=1e-05):
super(MyWrongBatchNorm, self).__init__()
self.momentum = momentum
self.insize = num_features
self.epsilon = epsilon

# init weight(gamma), bias(beta),running mean, var
self.weight = nn.Parameter(torch.ones(self.insize))
self.bias = nn.Parameter(torch.zeros(self.insize))
self.run_mean = torch.zeros(self.insize)
self.run_var = torch.ones(self.insize)

def forward(self, input, mode):
if mode == 'train':
mean = input.mean([0,2,3])  #mean across dims 0,2,3
mean = mean.view(1, self.insize, 1, 1)
var = input.var([0,2,3])  #var  across dims 0,2,3
var = var.view(1, self.insize, 1, 1)

weight = self.weight.view([1, self.insize, 1, 1])
bias = self.bias.view([1, self.insize, 1, 1])
out = weight*(input-mean)/torch.sqrt(var+self.epsilon) + bias

if mode == 'test':
pass # in this question, only consider train mode

return out
``````

MyBatchNorm directly computes variance.

``````class MyBatchNorm(nn.Module):
def __init__(self, num_features, momentum=0.9, epsilon=1e-05):
super(MyBatchNorm, self).__init__()
self.momentum = momentum
self.insize = num_features
self.epsilon = epsilon

# init weight(gamma), bias(beta),running mean, var
self.weight = nn.Parameter(torch.ones(self.insize))
self.bias = nn.Parameter(torch.zeros(self.insize))
self.run_mean = torch.zeros(self.insize)
self.run_var = torch.ones(self.insize)

def forward(self, input, mode):
if mode == 'train':
mean = input.mean(dim=(0, 2, 3)).view(1,self.insize, 1, 1)   #mean across dims 0,2,3
var = ((input - mean) ** 2).mean(dim=(0, 2, 3)).view(1,self.insize, 1, 1)  #var  across dims 0,2,3

weight = self.weight.view([1, self.insize, 1, 1])
bias = self.bias.view([1, self.insize, 1, 1])
out = weight*(input-mean)/torch.sqrt(var+self.epsilon) + bias

if mode == 'test':
pass # in this question, only consider train mode

return out
``````

Then run the codes below:

``````x = torch.randn(100,3,32,32)

mybatchnorm = MyBatchNorm(num_features=x.shape)
y1 = mybatchnorm(x, mode='train')

mywrongbatchnorm = MyWrongBatchNorm(num_features=x.shape)
y2 = mywrongbatchnorm(x)
``````

The results are quite different.

Why is this happening? I don’t think there’s anything wrong with it, so it looks very strange.

• I checked that the value of variance are same on MyBatchNorm and MyWrongBatchNorm.
i.e, var1 = input.var([0,2,3]).view(1, self.insize, 1, 1) and var2 = ((input - mean) ** 2).mean(dim=(0, 2, 3)).view(1,self.insize, 1, 1) have same values.

Batchnorm layers use the biased estimator to compute the `stddev`, so use `var = input.var([0,2,3], unbiased=False)` and the error norm drops to `Wrong result: tensor(3.5124e-05, grad_fn=<LinalgVectorNormBackward0>)`.

@ptrblck Thank you so much! I checked that `unbiased=False` makes correct answer. i.e, `Wrong result: tensor(3.5124e-05, grad_fn=<LinalgVectorNormBackward0>)`.

However, in MyBatchNorm class, `var = ((input - mean) ** 2).mean(dim=(0, 2, 3)).view(1,self.insize, 1, 1)` is also biased estimator (because it divides by n, not n-1). So why does MyBatchNorm output the correct answer?

The built-in `nn.BatchNormXd` as well as the corrected `MyWrongBatchNorm` and your `MyBatchNorm` layers are all using the biased variance, so I’m unsure what the issue is.
Or to rephrase it and to stick closer to the argument name: none of these layers uses the unbiased statistic.

Ohhh, I was confused for a while. It was my mistake. There is no problem in MyBatchNorm. Thank you for your kind answer!