# Output of BatchNorm1d in PyTorch does not match output of manually normalizing input dimensions

In an attempt to understand how `BatchNorm1d` works in PyTorch, I tried to match the output of `BatchNorm1d` operation on a 2D tensor with manually normalizing it. The manual output seems to be scaled down by a factor of 0.9747. Here’s the code (note that affine is set to false):

``````import torch
import torch.nn as nn

X = torch.randn(20,100) * 5 + 10
X = Variable(X)

B = nn.BatchNorm1d(100, affine=False)
y = B(X)

mu = torch.mean(X[:,1])
var_ = torch.var(X[:,1])
sigma = torch.sqrt(var_ + 1e-5)
x = (X[:,1] - mu)/sigma
#the ratio below should be equal to one
print(x.data / y[:,1].data )
``````

Output is:

``````0.9747
0.9747
0.9747
....
``````

Doing the same thing for `BatchNorm2d` works without any issues. How does `BatchNorm1d` calculate its output?

You need to pass unbiased=False to torch.var.

1 Like

Just for clarification: this should apply to BatchNorm2d (and 3d) as well, right? I guess since the effect of Bessel’s correction gets less significant as number of dimensions increases, I didn’t see any discrepancy for BatchNorm2d.

I would presume it would apply to the higher-d norms too, but haven’t tested myself.

Thank you for your answer. A follow-up question: how does `BatchNorm1d` calculate variance for 3D data? I noticed that the variance of normalized 3d data is nowhere near 1. Here’s what I did:

``````X3 = torch.randn(150,20,100) * 2 + 4
X3 = Variable(X3)
B2 = nn.BatchNorm1d(20)
Y = B2(X3)
print(Y.var())
``````

Every time I ran the above code, the output came out to be a number approximately in the range (0.20,0.40).

The variance of output for 2d and 4d inputs is close to 1 (as is expected).

A couple things:

1. BN applies an affine transform. So you want to set the affine scaling weight to 1 before getting `Y`, i.e. `B2.weight.data.fill_(1)`.
2. Use biased version of variance.
3. BN normalizes for data within each channel. So you should calculate `Y`'s variance for each channel, rather than for all data in `Y`.

Altogether, this gives correct result:

``````X3 = torch.randn(150,20,100) * 2 + 4
X3 = Variable(X3)
B2 = nn.BatchNorm1d(20)
B2.weight.data.fill_(1)
Y = B2(X3)
Y = Y.transpose(0, 1).contiguous().view(20, -1)  # put data for each channel in the second dimension
print(Y.var(dim=-1, unbiased=False))
``````
1 Like