Model.train and requires_grad

SreenivasVRao · September 24, 2018, 2:39pm

Hi all,

I’m not getting the usage of model.train(False) and param.requires_grad(False).

It seems to me that if I execute the following:

class Test(nn.Module):

    def __init__(self):
        super(Test, self).__init__()
        self.build_model()
        self.train(False)
        for param in self.parameters():
            print param.requires_grad

        exit(0)

I should get False for param.requires_grad. I get True however.

So what’s the difference between model.train(False) and:

for param in self.parameters():
     param.requires_grad = False

?

What are the best practices when trying to initialize a model and freeze weights?

Thanks!

colesbury · September 24, 2018, 10:13pm

It’s a bit confusing, but model.train(False) doesn’t change param.requires_grad. It only changes the behavior of nn.Dropout and nn.BatchNorm (and maybe a few other modules) to use the inference-mode behavior. This disables the stochastic behavior of Dropout and using the running_mean/var in BatchNorm instead of batch statistics.

If you want to freeze model weights, you should use the code snippet you wrote above:

for param in model.parameters():
     param.requires_grad = False

Depending on how you want BatchNorm to behave, you may want to call model.train(False) or call it on some sub-module of your model.

SreenivasVRao · September 25, 2018, 1:05am

Yeah, I ended up asking a friend. Maybe the documentation needs to be clearer on this. And just to follow up, param.requires_grad=False will make the optimizer throw a fuss unless you explicitly set it to only optimize those parameters that require gradient.

Borg · March 12, 2019, 10:04pm

I was wondering if we need to do something for the optimizer to make optimizer not change the parameter of the freezed layer? Like:

torch.optim.Adam(filter(lambda p: p.requires_grad, self.netG.parameters()), lr=opt.lr,
                                                betas=(opt.beta1, 0.999))

https://discuss.pytorch.org/t/how-the-pytorch-freeze-network-in-some-layers-only-the-rest-of-the-training/7088/9

MariosOreo · March 13, 2019, 5:25am

Hi,

I just have a try on requires_grad and params_groups in optimizer.

Set requires_grad=False can no longer calculate gradients of the related module and keep their grad None. Configuring optimizer can make the params don’t update in opt.step() but their gradients still calculate.