I want to train fully convolutional networks for pixel-wise semantic segmentation. And I want to use vgg_bn or resnet (which have batchNorm layers in them) as initialization. However, training FCN is memory extensive, so batchsize = 1 is often chosen. So finetuning the BatchNorm layer is senseless and I wish to freeze the parameters and moving averages of the BN layers during finetuning.
My question is how to realize this.
I have tried setting the requires_grad as False for all BN layers, but the moving averages changes after each forward propagation.
Anyone can help?
You need to set the batchnorm modules to eval() mode. If you are calling batchnorm with the Functional interface, this involves setting “training=false.” Otherwise just make sure all the modules you aren’t training have had .eval() called on them. Note that .train() or .eval() calls propagate to all child modules, so if you call .train() on your whole network you’ll have to call .eval() separately on the modules you don’t want to finetune.
Hi, @ajbrock, Thanks for your help. I used nn.BatchNorm2d and called .eval() for them in the init function. I think it is reset by parental module’s .train() calls.
Yes, that is precisely what I said. You need to call .eval() on the modules you want to put into inference mode, after you call .train() on the parent modules.