‘Frozen’ Batch Normalization still shows different performance in model.train() from model.eval()


#1

Hello, everyone. I use deeplab-v2-resnet model for image segmentation. But due to the small batch size when training, I want to ‘freeze’ the parameters of BN layers which are loaded from pretrained model. I implement ‘frozen’ BN as follows:

When training, I set momentum = 0 for all nn.BatchNorm2d, so I think the running mean and running var will keep still. Then I set requires_grad of parameters() of nn.BatchNorm2d false. so I think weight (gamma) and bias (beta) will keep still. I also add the following codes to further check the correctness, I save the parameters of BN layers in first step to critemp, then I check whether the parameters of BN layers of each step temp are unchanged.

        temp = []
        critemp = torch.load("bn_para.pt")
        def frozen_fn(m):
            classname = m.__class__.__name__
            if classname.find('BatchNorm2d') != -1:
                temp.append([np.average(m.running_mean.data.cpu().numpy()),
                      np.average(m.running_var.data.cpu().numpy()),
                      np.average(m.weight.data.cpu().numpy()),
                      np.average(m.bias.data.cpu().numpy())])
        model.apply(frozen_fn)
        assert temp == critemp

And when testing, I directly use model.eval() and I also ensure that the parameters of BN layers are the same as those when training. But the results are quite terrible. And I change the mode from eval to train. And the results turn out to be much better. However, I think that both parameters of train and eval should be the totally same (I don’t use drop out). But why I still get the different performance??

Can anyone give me a help? Thanks!


(Simon Wang) #2

In train mode, you are using batch stats (not running stats). Since you disabled running stats update, in eval mode, the running stats are still the old stats, and since eval mode uses these old running stats, results will be bad.


#3

Hello, Thanks for your reply! But I think that in in train mode, I set momentum to be 0, and running stats = (1-momentum)*history + momentum*current batch stats, thus running stats should just depend on history. So in either train() mode or eval() mode, they both uses the old (pretrained) running stats.

Any problem with my understanding??


(Simon Wang) #4

Batch norm in training mode uses batch stats, not running stats.


#5

Hello, The following content is directly copied from https://pytorch.org/docs/stable/nn.html

By default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default momentum of 0.1.
If track_running_stats is set to False , this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.

So does it mean BN in train mode still uses runnning stats?? Thanks~


(Simon Wang) #6

In training mode, it uses batch stats, i.e., the mean and variance computed using input data only in that batch, not the running average stats. Hope that this clarifies things.


#7

Thanks for your reply! I will directly read the original paper about this and check whether I misunderstand this point. Thanks for your time~


#8

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)


#9

Yes, you are right !!! Thank you very much!