I have a network that I’m trying to backprop the losses to only for the initial layers, and I do not want to update the gradients of the later layers nearer to where the losses are computed. However, I’m currently doing this via setting requires_grad_(False)
for the model’s parameters, and this completely stops the gradient computation.
So if some earlier layer depends on a later layer for gradients, it wouldn’t be able to do so. I’m hoping to decouple the gradient computation from the gradient update, but it seems that by using backward()
for the losses, the autograd package does the computing and update together. Is there a way for me to compute the gradients but not update the gradients?
I’m currently thinking of setting the LR to 0 and non-zero for some other layers/tensors - is there a more straightforward way?
If I was to set requires_grad_
to be true for a tensor, is there a need to zero gradients after the update to prevent gradient accumulation? Currently there seems to be no support for this.