Thanks a lot for your answer. But in that example, the two losses are reduced independently, if I want to reduce them in one loss function is this still possible?
so the call to .backward fills a variable.grad parameter with a value (or adds to it if you need to do accumulation)
.step simply applies the gradient to the variable.
Two optimizer.step calls will both apply variable.grad to a variable. If this is what you desire (which does not seem to be the case) then what you’re asking for will work.
Instead what you seem to want is two separate .grad attributes per variable that will be optimized by the respectiveb optimizers.
If you wanted to do this sort of thing, your best bet is messing with backward hooks.