BatchNorm2d when batch size 1 works, what is it doing?

BatchNorm2d works even when batch size is 1, which puzzles me. So what is it doing when batch size is 1? The only related thread I could find is https://github.com/pytorch/pytorch/issues/1381 without much explanation.

minimal example:

x = torch.FloatTensor(1,1,2,2)
x[0,0,0,:] = 1
x[0,0,1,:] = 2
m = nn.BatchNorm2d(1)
y = m(Variable(x))

// output is
// (0,0,.,.) =  [[-0.1544, -0.1544], [0.1544, 0.1544]]

There are the gamma and betas. They are initialized as gamma ~ U[0, 1], and beta <- 0. if you do

x = torch.FloatTensor(1,1,2,2)
x[0,0,0,:] = 1
x[0,0,1,:] = 2
m = nn.BatchNorm2d(1)
m.weight.data.fill_(1)
m.bias.data.zero_()
y = m(Variable(x))
# you will get
>>> y

(0 ,0 ,.,.) =
 -1.0000 -1.0000
  1.0000  1.0000
[torch.FloatTensor of size (1,1,2,2)]

1 Like

After y = m(x)m.running_mean = 0.1500 and m.running_var = 0.9333.
Why the resulting y is not (x - 0.15) / sqrt(0.9333), which is

tensor([[[[0.8798, 0.8798],
          [1.9149, 1.9149]]]])

?