Proper way of fixing batchnorm layers during training

maunz · February 3, 2018, 7:16am

Hi,
I’m currently working on finetuning a large CNN for semantic segmentation and due to GPU memory limitations I can only use a batch size of one.

The CNN I’m using has a bunch of batch normalization layers, which I wan’t to fix during training (since batch normalization with batch size 1 does not make sense). The idea is to set the mode of the batchnorm layers to eval during training. I use the following code to do this

net.train()
for module in net.modules():
        if isinstance(module, torch.nn.modules.BatchNorm1d):
            module.eval()
        if isinstance(module, torch.nn.modules.BatchNorm2d):
            module.eval()
        if isinstance(module, torch.nn.modules.BatchNorm3d):
            module.eval()

But I can’t seem to get the training to work.

I’ve debugging by doing the same on training code that works fine with a large batch size and not freezing the batch normalization layers. Fixing the batch norm layers will then make the training diverge.

Am I missing something here, does the batchnorm layers backprop function work in eval mode? Is there anything else I need to be doing?

Thanks!

ptrblck · February 4, 2018, 12:27am

If your BatchNorm layers are set to affine=True, they have the weight and bias parameters (gamma and beta in the paper), which are initialized with some values. If you don’t train these layers at all, both parameters will stay in this initial state.

If you still want to standardize your activations with the running_mean and running_var, I would suggest to set affine=False and test it again.

Let me know, if this helps!

maunz · February 4, 2018, 9:29am

Thank you for your reply!
They are set to affine=True. Since I’m loading a pretrained model I believe that I need these parameters to be able to get the correct pretrained network.

What I’m trying to achieve is to keep the gamma and beta from the pretrained network. As well as the estimate of the running_mean and running_var. I then want to train the network without updating running_mean and running_var (since I can’t fit a large enough batch in the GPU memory). This seems to break down the training in some way.

Gurkirt · May 3, 2018, 7:59am

@maunz
I had same problem. It seems I setting eval mode wrong.
This thread helped me Freeze BatchNorm layer lead to NaN

def set_bn_to_eval(m):
classname = m.class.name
if classname.find(‘BatchNorm’) != -1:
m.eval()

net.apply(set_bn_to_eval)
if you want to set_bn_to_eval of some subnet or some basenetwork then just use.
net..apply(set_bn_to_eval)

hope this helps.

deepak242424 · October 31, 2018, 8:57am

Hi @maunz, I am facing the exact same issue. By any chance did u fix it?

maunz · October 31, 2018, 12:03pm

Hi!
I noticed an error with my pretrained network unrelated to the batchnorm layers. When I fixed that setting the batchnorm layers to .eval() worked,