I have a simple question that I fail to understand, how does the loss which is usually a single value, propagate correctly in a network with multiple output nodes.
I’ll give an example, assuming I have a network with X inputs and 2 outputs. The network get pictures of cats or dogs or something else. And should output high value in output node 1 for dog (low value if no dog), and high value for output node 2 for cats. (And low values for no cat).
Assuming I use MSE for my loss function, or some other loss function (CrossEntropyLoss) which outputs a single value, how does the loss propagate correctly for each node?
For example, let say I gave the network a batch with 1 pictures of a cat, and I got a vector of 2 (2 output nodes) which was (1,1) - therefor for node-2 I do not need to propagate any error but for node 2 I do need to propagate error. But when I do MSE I get a single loss value, how does this single loss value propagate correctly to each node.
My intuition would be that I’ll have a loss which is a vector a loss for node-1 and a loss for node-2 and that each loss will propagate to his node. But as I said the MSE gives back a single loss value.