Hi,
I’m currently working on finetuning a large CNN for semantic segmentation and due to GPU memory limitations I can only use a batch size of one.
The CNN I’m using has a bunch of batch normalization layers, which I wan’t to fix during training (since batch normalization with batch size 1 does not make sense). The idea is to set the mode of the batchnorm layers to eval during training. I use the following code to do this
net.train()
for module in net.modules():
if isinstance(module, torch.nn.modules.BatchNorm1d):
module.eval()
if isinstance(module, torch.nn.modules.BatchNorm2d):
module.eval()
if isinstance(module, torch.nn.modules.BatchNorm3d):
module.eval()
But I can’t seem to get the training to work.
I’ve debugging by doing the same on training code that works fine with a large batch size and not freezing the batch normalization layers. Fixing the batch norm layers will then make the training diverge.
Am I missing something here, does the batchnorm layers backprop function work in eval mode? Is there anything else I need to be doing?
If your BatchNorm layers are set to affine=True, they have the weight and bias parameters (gamma and beta in the paper), which are initialized with some values. If you don’t train these layers at all, both parameters will stay in this initial state.
If you still want to standardize your activations with the running_mean and running_var, I would suggest to set affine=False and test it again.
Thank you for your reply!
They are set to affine=True. Since I’m loading a pretrained model I believe that I need these parameters to be able to get the correct pretrained network.
What I’m trying to achieve is to keep the gamma and beta from the pretrained network. As well as the estimate of the running_mean and running_var. I then want to train the network without updating running_mean and running_var (since I can’t fit a large enough batch in the GPU memory). This seems to break down the training in some way.