I have a well trained coarse net (including BN layers) which I want to freeze to finetune other layers added. I filtered out the parameters of the coarse net when construct optimizer. Is model.eval() or something else necessary in this case？I don’t want the BN layers to recalculate the mean and variance in every batch.
If you set your
nn.BatchNorm layers to
eval() the running estimates won’t be updated anymore.
Additionally to filtering out the parameters, you could also set the
.requires_grad attribute to
False, so that the gradients won’t be computed if not necessary.
Thanks!! But I am still a little confused.
for p in model.parameters(): p.requires_grad = False for p in model.fine.parameters(): p.requires_grad = True optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, model.parameters()),lr = 0.001)
This is my code for finetuning, and I didn’t use model.eval(). Do the BN layers in my model behave the same as BN layers in other models whose eval() are true?
If it’s not same, what should I do to freeze the BN layers (make BN layers use global means and variances instead of them of every mini-batch)?
I am looking forward to your reply.
If you only want to fine-tune the parameters in
model.fine you could do the following:
model = ... optimizer = torch.optim.Adam(model.fine.parameters(),lr = 0.001)
Now the optimizer will only try to update parameters within the
It depends if they were set to
.eval() before, but the default mode is
train() after loading the model.
If you want to set the complete model to
eval mode, just use
Alternatively, if you just want to apply it on all batch norm layers, you could use:
def set_bn_eval(module): if isinstance(module, torch.nn.modules.batchnorm._BatchNorm): module.eval() model.apply(set_bn_eval)
Hi @ptrblck a little more detail on this - does setting the batch norm layer to
eval allow us to train ‘gamma’ and
beta parameters? I understand that the
eval operation allows us to use the current batch’s mean and variance when fine tuning a pretrained model.
.eval() call on batchnorm layers does not freeze the affine parameters, so that the gamma (
weight) and beta (
bias) parameters can still be trained.
eval() on batchnorm layers will use the running stats, while
train() will use the batch stats and update the running stats.
Thanks, that clarification was useful.
Hi @ptrblck - one follow up on your response earlier. Does setting
requires_grad = False sufficient, even for batch norm layers? Or do we have to do both, i.e., set
requires_grad = False AND
requires_grad attribute and calling
train()/eval() on it behave differently.
BatchNorm layers use trainable affine parameters by default, which are assigned to the
.bias attribute. These parameters use
.requires_grad = True by default and you can freeze them by setting this attribute to
During training (i.e. after calling
model.train() or after creating the model) the batchnorm layer will normalize the input activation using the batch stats and will update the internal stats using a running average.
model.eval() the batchnorm layers will use the trained internal running stats (stored as
.running_var) to normalize the input activation.
Got it, thanks so much for your detailed response! So, in the event that I set just
requires_grad = False for BN layers, it may still be computing the running average during training phase, which is not ideal. I should be doing both, so the batchnorm layer uses the stored
.running_var values for normalization.
If you want to validate your model, wrapping the forward pass into
with torch.no_grad() or
with torch.inference_mode() and calling
model.eval() would also work. You wouldn’t necessarily need to flip the
.requires_grad attribute (it would also work, but the former guards might be a simpler way).