How do I freeze the weights of a model and still compute gradients?

leeks · January 30, 2019, 4:14am

I have a network that I’m trying to backprop the losses to only for the initial layers, and I do not want to update the gradients of the later layers nearer to where the losses are computed. However, I’m currently doing this via setting requires_grad_(False) for the model’s parameters, and this completely stops the gradient computation.

So if some earlier layer depends on a later layer for gradients, it wouldn’t be able to do so. I’m hoping to decouple the gradient computation from the gradient update, but it seems that by using backward() for the losses, the autograd package does the computing and update together. Is there a way for me to compute the gradients but not update the gradients?

I’m currently thinking of setting the LR to 0 and non-zero for some other layers/tensors - is there a more straightforward way?

If I was to set requires_grad_ to be true for a tensor, is there a need to zero gradients after the update to prevent gradient accumulation? Currently there seems to be no support for this.

AIJoris · April 19, 2019, 1:00pm

I think there are two ways to do this.

Only pass the parameters of the earlier layer(s) you want to update to the optimizer

optim = Adam([param for param in layer1.parameters()])

Manually set all gradients of layers you do not want to update to 0 after calling loss.backward()

for param in layer2.parameters():
    param.grad[:] = 0