Proper BatchNorm2d Mode during Fine-Tuning

During fine-tuning, if I set the BatchNorm2d in evaluation mode during fine-tuning on different datasets, the accuracy is increased from 63% to 80%.

However, if I leave the BatchNorm2d in the training mode, the performance drops drastically from 63% (before fine-tuning started) to only 39% in the first iteration.

  1. How if I wanted to maintain the performance while leaving the BatchNorm2d in the training mode during fine-tuning?

  2. What’s the BatchNorm2d setting equivalence if we set it to evaluation mode?

1 Like
  1. How large is your batch size during fine-tuning? Note that the updates for the running stats might be off, if the batch is too small. You could change the momentum to smooth the updates a bit in this case.
    Also, does the accuracy recover at some point?

  2. During evaulation the running estimates of mean and std are used instead of the current batch statistics.

1 Like

Hi thank you for quick response.

  1. The batch size is set to 64 during fine-tuning in my experiments. And yes, I tried different values of momentum from 0.01, 0.1, 0.2, up to 1, and it is observed that the best accuracy is only 72% (momentum = 0.1) compared to 80% for BatchNorm2d in evaluation mode.

  2. Yes, I tried to manually code the evaluation mode to see how batch-norm works by setting child_layer.momentum = 0, child_layer.weight,requires_grad = False, child_layer.bias.requires_grad = False, but it seems like the results are totally different with that of simply setting child_layer.eval(). May I have your advice on this?

Batchnorm layers contain trainable parameters (weight and bias), if affine=True (default), as well as running estimates (running_mean and running_var).
bn.train()/eval() changes the behavior of the running estimates. The affine parameters will not be changed by it.