BatchNorm2d works even when batch size is 1, which puzzles me. So what is it doing when batch size is 1? The only related thread I could find is https://github.com/pytorch/pytorch/issues/1381 without much explanation.
minimal example:
x = torch.FloatTensor(1,1,2,2)
x[0,0,0,:] = 1
x[0,0,1,:] = 2
m = nn.BatchNorm2d(1)
y = m(Variable(x))
// output is
// (0,0,.,.) = [[-0.1544, -0.1544], [0.1544, 0.1544]]
There are the gamma and betas. They are initialized as gamma ~ U[0, 1], and beta <- 0. if you do
x = torch.FloatTensor(1,1,2,2)
x[0,0,0,:] = 1
x[0,0,1,:] = 2
m = nn.BatchNorm2d(1)
m.weight.data.fill_(1)
m.bias.data.zero_()
y = m(Variable(x))
# you will get
>>> y
(0 ,0 ,.,.) =
-1.0000 -1.0000
1.0000 1.0000
[torch.FloatTensor of size (1,1,2,2)]