How are values nromalized during .train using a batchnorm layer

JanoschMenke · November 17, 2021, 10:00am

So to be more precise, I would like to know how during training the normalized values are calculated.
Below I have a code snippet which I will be referring to from now on.

If I run the following code, it will always return for the first column:
[-1.2247, 0.0000, 1.2247] which I do not understand. The “right” normalization should be [-1.,0.,1.].
When I use bn.eval(), then the output is correct.

What I would like to know is, what term am I missing during .train() that makes the values so different?

x = torch.tensor([[1.,2.,3.,4.,5.],
                           [2.,2.,2.,2.,2.],
                         [3.,3.,3.,3.,3.]])
bn = nn.BatchNorm1d(5)

[bn(x) for i in range(1000)]

my3bikaht · November 17, 2021, 11:32am

Track these values through the loop

torch.mean(x, axis=0)
bn.running_mean
bn.momentum
torch.var(x, axis=0)
bn.running_var

JanoschMenke · November 17, 2021, 3:19pm

Maybe I did not explain my issue not well enough.
In the following, I am normalizing the values as its explained in the documentation.
However, I am unable to recreate the actual values obtained when applying bn to beispiel_x

beispiel_x = torch.tensor([[1.,2.,3.,4.,5.],
                           [2.,2.,2.,2.,2.],
                         [3.,3.,3.,3.,3.]])
bn = nn.BatchNorm1d(5)

print((beispiel_x -torch.mean(beispiel_x, axis=0))/torch.sqrt(torch.var(beispiel_x, axis=0)+bn.eps))
print((beispiel_x -bn.running_mean)/torch.sqrt(bn.running_var+bn.eps))
bn(beispiel_x)

mMagmer · November 17, 2021, 5:32pm

@JanoschMenke torch BatchNorm1d doesn’t use unbiased estimator for variance.
And running_mean and running_var only apply when model is in eval mode.
Also, BatchNorm1d module has learnable affine parameters. That you should consider when computing output. But it initializes with 0 and 1 for bias and gamma.
Running averages are changing as you forward the input through the model.

print((beispiel_x -torch.mean(beispiel_x, axis=0))/torch.sqrt(torch.var(beispiel_x, axis=0,unbiased=False)+bn.eps))

>>> 
tensor([[-1.2247, -0.7071,  0.7071,  1.2247,  1.3363],
        [ 0.0000, -0.7071, -1.4142, -1.2247, -1.0690],
        [ 1.2247,  1.4142,  0.7071,  0.0000, -0.2673]])

JanoschMenke · November 17, 2021, 5:43pm

Oh yes you are right I overread this part