When finetuning model from a pretrained one, the CAFFE users would set the BN layers with “use_global_stats: true”, which will employ the mean and std values in the pretrained model for the finetuning stage. In my works, I find that setting is sometimes important for performance. What should I do when using Pytorch if I want to employ the already-learned mean and std rather than the moving average in finetuning jobs?
model.train(True) for m in model.modules(): if isinstance(m, nn.BatchNorm1d) or isinstance(m, nn.BatchNorm2d): m.eval() # Run your training here
You can further set requires_grad=False for your BN layers, but that only effects weights + biases.