During fine-tuning, if I set the BatchNorm2d in evaluation mode during fine-tuning on different datasets, the accuracy is increased from 63% to 80%.
However, if I leave the BatchNorm2d in the training mode, the performance drops drastically from 63% (before fine-tuning started) to only 39% in the first iteration.
-
How if I wanted to maintain the performance while leaving the BatchNorm2d in the training mode during fine-tuning?
-
What’s the BatchNorm2d setting equivalence if we set it to evaluation mode?
1 Like
Hi thank you for quick response.
-
The batch size is set to 64 during fine-tuning in my experiments. And yes, I tried different values of momentum from 0.01, 0.1, 0.2, up to 1, and it is observed that the best accuracy is only 72% (momentum = 0.1) compared to 80% for BatchNorm2d in evaluation mode.
-
Yes, I tried to manually code the evaluation mode to see how batch-norm works by setting child_layer.momentum = 0, child_layer.weight,requires_grad = False, child_layer.bias.requires_grad = False, but it seems like the results are totally different with that of simply setting child_layer.eval(). May I have your advice on this?
Batchnorm layers contain trainable parameters (weight
and bias
), if affine=True
(default), as well as running estimates (running_mean
and running_var
).
bn.train()/eval()
changes the behavior of the running estimates. The affine parameters will not be changed by it.