Hi

I have a network like out = h2 (h1(x)). I have two losses l1 and l2 computed using the out. Is it possible to

have the gradients of l2 only perform on h2 and the gradients of l1 only on h1 as h1 is an input of h2? at first I though it might be possible by defining different optimizers but the problem is that the gradients from both losses will be accumulate for h1( if I understand the autograd function correctly) and apparently the detach() is not a solution.