How to seperate effect of gradients from different loss


I have a network like out = h2 (h1(x)). I have two losses l1 and l2 computed using the out. Is it possible to
have the gradients of l2 only perform on h2 and the gradients of l1 only on h1 as h1 is an input of h2? at first I though it might be possible by defining different optimizers but the problem is that the gradients from both losses will be accumulate for h1( if I understand the autograd function correctly) and apparently the detach() is not a solution.

I think I figured it out. I need to do it sequentially with two optimizer but each time I need to zero the gradients.

Yes exactly.
You can backward one loss, do the step for that network, zero the gradients, backward the second one and do the step on the second network.