Backpropagation in a bifurcating architecture


I am trying to train a model with one input and two outputs, where each of the two outputs is computed with separate convolutional and dense layers within an architecture. Any suggestions? I have the model coded up, but was curious how I ensure the loss of one output does not impact the gradients of the other branch? Is there a way to specify when I do loss.backward? Do I need two separate loss functions?

To clarify, the input image is immediately passed through two separate branches, each consisting a simple CNN architecture, ending with two classification predictions (two different classification questions respectively).


Your approach should work and the computation graph would make sure to calculate the corresponding gradients only.
Could you post your model definition, so that we could check it, if you still have doubts?