What is the behaviour if I don't use an output head in loss computation?


I have a network which outputs 2 values, x and y. I intend to not use the output y for training. Will the backward pass add weights with garbage values due to y branch?

Below is just an illustration of the model at hand.
model = common_model -> [head_x, head_y]

The optimisers for x and y head are initialized with model.parameters() field. And I only call opt_x.zero_grad() before each forward pass.

Any things to take care of for such situations? Like calling zero_grad() for opt_y as well?


If y is not used, then no gradient will flow back from there. And only the contribution from x will count.