Minimal Working Example for strange behavior of BatchNorm

import torch
from torch import nn
from torch.autograd import Variable

torch.manual_seed(1)

net = nn.Sequential(
    nn.Linear(100, 100),
    nn.BatchNorm1d(100),
    nn.Linear(100, 1)
)

im = Variable(torch.ones(10, 100))
for _ in range(10):
    net.train(True)
    print(net(im))
for _ in range(10):
    net.train(False)
    print(net(im))

the first 10 prints output arrays with one value(-8.6313 * 1e-2), and the second 10 prints output arrays with another value (-0.1563 * 1e-2).

It is rather strange and confusing.

  1. the first 10 output is the same, is it reasonable? I think the runing_mean and running_var have some initial value and then get updated at each step. so the output should be changing.

  2. the second 10 output is the same, it is reasonable since it’s in eval mode. what is confusing is that it is not the same as the last training output. I think if we set the model to evaluation mode, it just freezes runing_mean and running_var, which should make the second 10 outputs the same as the last training output.

Any ideas? I’m almost driven crazy by this!

On 1: in training mode, BatchNorm DOES NOT use running_mean and running_std to compute the output. It only updates running_mean and running_std values internally. Hence, what you see is exactly what’s expected.

On 2: At test time, the output is not the same as the last training output. The output uses the previously updated running_mean and running_std to compute the output. The default momentum value for BatchNorm is 0.9, so after 10 iterations through, it should be roughly in the neighborhood, but not exactly the same. If you run the first for loop for 100 iterations, it’ll probably be very close.

There’s also the other problem that im, which is your input has a mean of 1 and a standard deviation of 0. That’s not great to compute and running-update batch statistics.

1 Like