To freeze layers, is it enough to set requires_grad to False?

tstandley · May 6, 2022, 7:02am

Very similar questions have been asked before, but this one is subtly different.

Like many, I want to freeze some layers of my neural network. I understand that I can not add those parameters to the optimizer. I understand that I should set requires_grad to False for these layers, so that back propagation doesn’t do extra work.

But I want to know if only setting requires_grad to False is sufficient for the right thing to happen.

I’m worried that parameters without gradients are treated as having zero gradients by the optimizer. That would mean that it would still apply weight decay to those layers.

Does anybody know for sure if that’s a real problem or not?

I’m going to be changing which layers are frozen during training, and I’d rather not have to create a new optimizer every time I change requires_grad for a layer.

Thanks!

Matias_Vasquez · May 6, 2022, 7:30am

Hi, setting it to False is sufficient for the right thing to happen.

Here is a very good explanation.

But just to be sure, here is the source code for the SGD optimizer. As you can see, when updating the parameters the flag is checked and if set to False then nothing will happen. So it will no be taken as having zero gradient, it will just do nothing for those parameters.

github.com

pytorch/pytorch/blob/b7b99ab0c8f82100177729b9751481852d83e77e/torch/optim/sgd.py#L95-L96

      
        
            if p.grad is None:
                continue

tstandley · May 7, 2022, 10:37am

Thanks @Matias_Vasquez!