I am working on a task that each example has multiple targets.
My way to compute loss is mapping the example to an embedding and expand the embedding to the shape of (length of targets, hidden_size), then compute the cross entropy loss.
My question is “Does each loss of targets affect the backward process?”
If I understand correctly, your problem is of multi-task learning. You can create separate sub-tasks (i.e. separate classifiers) for each target, aggregate the losses (sum/avg) from each classifier, and backpropagate it. The prediction error from each of the tasks will update the network weights.