Question about BatchNorm2d

Jack_Shi · December 12, 2022, 1:37am

A naive question about how BatchNorm2d works. Say I have an input size of

im = torch.rand(1000,1,2,2)
print(im.shape)
torch.Size([1000, 1, 2, 2])

i.e. a batch size of 1000 2x2 matrices.
Now say I initialize a BatchNorm2d layer and apply to my randomly generated tensor

m = nn.BatchNorm2d(1, affine=False,eps=0)
im = m(im)

where 1 corresponds to the number of channels, or the num_features in BatchNorm2d. I would expect torch.mean(im,0) to be a 2x2 matrix that is essentially 0 and torch.var(im,0) to be a 2x2 matrix of 1. However, when I run the simple few lines above, I get

In [102]: torch.mean(im,0)
Out[102]: 
tensor([[[ 0.0650, -0.0563],
         [ 0.0067, -0.0154]]])

and

In [103]: torch.var(im,0)
Out[103]: 
tensor([[[0.9965, 1.0232],
         [1.0066, 0.9700]]])

Those results are not what I have expected. I’m sure I misunderstand how BatchNorm2D works, but where did I make the mistake?

ptrblck · December 12, 2022, 1:44am

You can see a manual (and slow) reference implementation in this code snippet which I’ve created some time ago.
As can be seen there, the proper check would be:

im.mean([0, 2, 3])
Out[55]: tensor([-5.6028e-08])

im.var([0, 2, 3], unbiased=False)
Out[58]: tensor([1.0000])

Jack_Shi · December 12, 2022, 2:00am

I see, that makes sense. Thank you for the reference!