suppose three Linear layers A,B,C,and I set B’s requires_grad to False,can C’s loss bp to A?
Yes, intermediate parameters, which don’t require gradients, do not stop the backpropagation, if some earlier parameters need gradients.
But if the intermediate parameter don’t require gradient,how do the earlier parameters gets loss bp from the later parameters?I don’t understand the process
Does requires_grad mean compute grads but do not flush them?
The parameter gradients, which are not needed, won’t be computed (their
.grad attribute won’t be updated), but the gradient calculation will continue, if it’s needed for earlier layers.