‘Frozen’ Batch Normalization still shows different performance in model.train() from model.eval()

HomerNee · September 14, 2018, 8:41am

Hello, everyone. I use deeplab-v2-resnet model for image segmentation. But due to the small batch size when training, I want to ‘freeze’ the parameters of BN layers which are loaded from pretrained model. I implement ‘frozen’ BN as follows:

When training, I set momentum = 0 for all nn.BatchNorm2d, so I think the running mean and running var will keep still. Then I set requires_grad of parameters() of nn.BatchNorm2d false. so I think weight (gamma) and bias (beta) will keep still. I also add the following codes to further check the correctness, I save the parameters of BN layers in first step to critemp, then I check whether the parameters of BN layers of each step temp are unchanged.

        temp = []
        critemp = torch.load("bn_para.pt")
        def frozen_fn(m):
            classname = m.__class__.__name__
            if classname.find('BatchNorm2d') != -1:
                temp.append([np.average(m.running_mean.data.cpu().numpy()),
                      np.average(m.running_var.data.cpu().numpy()),
                      np.average(m.weight.data.cpu().numpy()),
                      np.average(m.bias.data.cpu().numpy())])
        model.apply(frozen_fn)
        assert temp == critemp

And when testing, I directly use model.eval() and I also ensure that the parameters of BN layers are the same as those when training. But the results are quite terrible. And I change the mode from eval to train. And the results turn out to be much better. However, I think that both parameters of train and eval should be the totally same (I don’t use drop out). But why I still get the different performance??

Can anyone give me a help? Thanks!

SimonW · September 14, 2018, 7:00pm

In train mode, you are using batch stats (not running stats). Since you disabled running stats update, in eval mode, the running stats are still the old stats, and since eval mode uses these old running stats, results will be bad.

HomerNee · September 15, 2018, 2:44am

Hello, Thanks for your reply! But I think that in in train mode, I set momentum to be 0, and running stats = (1-momentum)*history + momentum*current batch stats, thus running stats should just depend on history. So in either train() mode or eval() mode, they both uses the old (pretrained) running stats.

Any problem with my understanding??

SimonW · September 15, 2018, 2:45am

Batch norm in training mode uses batch stats, not running stats.

HomerNee · September 15, 2018, 2:51am

Hello, The following content is directly copied from torch.nn — PyTorch 2.1 documentation

By default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default momentum of 0.1.
If track_running_stats is set to False , this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.

So does it mean BＮ in train mode still uses runnning stats?? Thanks~

SimonW · September 15, 2018, 3:10am

In training mode, it uses batch stats, i.e., the mean and variance computed using input data only in that batch, not the running average stats. Hope that this clarifies things.

HomerNee · September 15, 2018, 3:28am

Thanks for your reply! I will directly read the original paper about this and check whether I misunderstand this point. Thanks for your time~

HomerNee · September 18, 2018, 4:56am

Yes, you are right !!! Thank you very much!