I am working on a task that each example has multiple targets.
My way to compute loss is mapping the example to an embedding and expand the embedding to the shape of (length of targets, hidden_size), then compute the cross entropy loss.
My question is “Does each loss of targets affect the backward process?”
Thanks in advance!
Could you give us a small code example, so that we could have a look if and how the backward pass could be affected?
As @ptrblck suggests, share a code example
If I understand correctly, your problem is of multi-task learning. You can create separate sub-tasks (i.e. separate classifiers) for each target, aggregate the losses (sum/avg) from each classifier, and backpropagate it. The prediction error from each of the tasks will update the network weights.