Let’s say there were two losses, and some of the losses share some parts of the architecture (i.e. a feature extractor for images).
- Cross Entropy
The floating value for the Cross Entropy loss would be something low, say 0.25
The floating value for the MSE loss would be something larger, say 20.4
# Let the below variables be tensors holding the graph for calculating the loss # cross_entropy_loss = 0.25 # mse_loss = 20.4
I’m currently doing this:
tot_loss = cross_entropy_loss + mse_loss tot_loss.backward()
Is this ok? Or is it recommended to normalize the floating values of the losses to ensure both losses have equal backprop gradient importance/values?