Different result between Tensor.var() vs directly calculating variance

I implemented BatchNorm2d. But I found that there is difference between the way of computing variance of input. Please see the below code. Only difference between MyWrongBatchNorm and MyBatchNorm is the way of computing variance in forward().

MyWrongBatchNorm computes variance by using tensor.var()

class MyWrongBatchNorm(nn.Module):
    def __init__(self, num_features, momentum=0.9, epsilon=1e-05):
        super(MyWrongBatchNorm, self).__init__()
        self.momentum = momentum
        self.insize = num_features
        self.epsilon = epsilon
        
        # init weight(gamma), bias(beta),running mean, var
        self.weight = nn.Parameter(torch.ones(self.insize))
        self.bias = nn.Parameter(torch.zeros(self.insize))
        self.run_mean = torch.zeros(self.insize)
        self.run_var = torch.ones(self.insize)

    def forward(self, input, mode):
        if mode == 'train':
            mean = input.mean([0,2,3])  #mean across dims 0,2,3
            mean = mean.view(1, self.insize, 1, 1)
            var = input.var([0,2,3])  #var  across dims 0,2,3
            var = var.view(1, self.insize, 1, 1)
           
            weight = self.weight.view([1, self.insize, 1, 1])
            bias = self.bias.view([1, self.insize, 1, 1])
            out = weight*(input-mean)/torch.sqrt(var+self.epsilon) + bias

        if mode == 'test':
            pass # in this question, only consider train mode

        return out

MyBatchNorm directly computes variance.

class MyBatchNorm(nn.Module):
    def __init__(self, num_features, momentum=0.9, epsilon=1e-05):
        super(MyBatchNorm, self).__init__()
        self.momentum = momentum
        self.insize = num_features
        self.epsilon = epsilon
        
        # init weight(gamma), bias(beta),running mean, var
        self.weight = nn.Parameter(torch.ones(self.insize))
        self.bias = nn.Parameter(torch.zeros(self.insize))
        self.run_mean = torch.zeros(self.insize)
        self.run_var = torch.ones(self.insize)

    def forward(self, input, mode):
        if mode == 'train':
            mean = input.mean(dim=(0, 2, 3)).view(1,self.insize, 1, 1)   #mean across dims 0,2,3
            var = ((input - mean) ** 2).mean(dim=(0, 2, 3)).view(1,self.insize, 1, 1)  #var  across dims 0,2,3

            weight = self.weight.view([1, self.insize, 1, 1])
            bias = self.bias.view([1, self.insize, 1, 1])
            out = weight*(input-mean)/torch.sqrt(var+self.epsilon) + bias

        if mode == 'test':
            pass # in this question, only consider train mode

        return out

Then run the codes below:

x = torch.randn(100,3,32,32)
answer = nn.BatchNorm2d(x.shape[1])(x)

mybatchnorm = MyBatchNorm(num_features=x.shape[1])
y1 = mybatchnorm(x, mode='train')
print('Right result: ',torch.norm(y1-answer, p=2))

mywrongbatchnorm = MyWrongBatchNorm(num_features=x.shape[1])
y2 = mywrongbatchnorm(x)
print('Wrong result: ',torch.norm(y2-answer, p=2))

The results are quite different.
Right result: tensor(2.8985e-05, grad_fn=)
Wrong result: tensor(0.0027, grad_fn=)

Why is this happening? I don’t think there’s anything wrong with it, so it looks very strange.

  • I checked that the value of variance are same on MyBatchNorm and MyWrongBatchNorm.
    i.e, var1 = input.var([0,2,3]).view(1, self.insize, 1, 1) and var2 = ((input - mean) ** 2).mean(dim=(0, 2, 3)).view(1,self.insize, 1, 1) have same values.

Batchnorm layers use the biased estimator to compute the stddev, so use var = input.var([0,2,3], unbiased=False) and the error norm drops to Wrong result: tensor(3.5124e-05, grad_fn=<LinalgVectorNormBackward0>).

@ptrblck Thank you so much! I checked that unbiased=False makes correct answer. i.e, Wrong result: tensor(3.5124e-05, grad_fn=<LinalgVectorNormBackward0>).

However, in MyBatchNorm class, var = ((input - mean) ** 2).mean(dim=(0, 2, 3)).view(1,self.insize, 1, 1) is also biased estimator (because it divides by n, not n-1). So why does MyBatchNorm output the correct answer?

The built-in nn.BatchNormXd as well as the corrected MyWrongBatchNorm and your MyBatchNorm layers are all using the biased variance, so I’m unsure what the issue is.
Or to rephrase it and to stick closer to the argument name: none of these layers uses the unbiased statistic.

Ohhh, I was confused for a while. It was my mistake. There is no problem in MyBatchNorm. Thank you for your kind answer!