Output of BatchNorm1d in PyTorch does not match output of manually normalizing input dimensions

nivter · February 26, 2018, 3:38pm

In an attempt to understand how BatchNorm1d works in PyTorch, I tried to match the output of BatchNorm1d operation on a 2D tensor with manually normalizing it. The manual output seems to be scaled down by a factor of 0.9747. Here’s the code (note that affine is set to false):

import torch
import torch.nn as nn
from torch.autograd import Variable

X = torch.randn(20,100) * 5 + 10
X = Variable(X)

B = nn.BatchNorm1d(100, affine=False)
y = B(X)

mu = torch.mean(X[:,1])  
var_ = torch.var(X[:,1])
sigma = torch.sqrt(var_ + 1e-5)
x = (X[:,1] - mu)/sigma
#the ratio below should be equal to one
print(x.data / y[:,1].data )

Output is:

0.9747
0.9747
0.9747
....

Doing the same thing for BatchNorm2d works without any issues. How does BatchNorm1d calculate its output?

malmaud · February 26, 2018, 4:22pm

You need to pass unbiased=False to torch.var.

nivter · February 26, 2018, 4:45pm

Just for clarification: this should apply to BatchNorm2d (and 3d) as well, right? I guess since the effect of Bessel’s correction gets less significant as number of dimensions increases, I didn’t see any discrepancy for BatchNorm2d.

malmaud · February 26, 2018, 6:58pm

I would presume it would apply to the higher-d norms too, but haven’t tested myself.

nivter · February 27, 2018, 2:43pm

Thank you for your answer. A follow-up question: how does BatchNorm1d calculate variance for 3D data? I noticed that the variance of normalized 3d data is nowhere near 1. Here’s what I did:

X3 = torch.randn(150,20,100) * 2 + 4
X3 = Variable(X3)
B2 = nn.BatchNorm1d(20)
Y = B2(X3)
print(Y.var())

Every time I ran the above code, the output came out to be a number approximately in the range (0.20,0.40).

The variance of output for 2d and 4d inputs is close to 1 (as is expected).

SimonW · February 27, 2018, 6:45pm

A couple things:

BN applies an affine transform. So you want to set the affine scaling weight to 1 before getting Y, i.e. B2.weight.data.fill_(1).
Use biased version of variance.
BN normalizes for data within each channel. So you should calculate Y's variance for each channel, rather than for all data in Y.

Altogether, this gives correct result:

X3 = torch.randn(150,20,100) * 2 + 4
X3 = Variable(X3)
B2 = nn.BatchNorm1d(20)
B2.weight.data.fill_(1)
Y = B2(X3)
Y = Y.transpose(0, 1).contiguous().view(20, -1)  # put data for each channel in the second dimension
print(Y.var(dim=-1, unbiased=False))