Should multi-task losses be normalized?

Epoching · July 2, 2020, 6:45am

Let’s say there were two losses, and some of the losses share some parts of the architecture (i.e. a feature extractor for images).

Cross Entropy
MSE

The floating value for the Cross Entropy loss would be something low, say 0.25
The floating value for the MSE loss would be something larger, say 20.4

# Let the below variables be tensors holding the graph for calculating the loss
# cross_entropy_loss = 0.25
# mse_loss = 20.4

I’m currently doing this:

tot_loss = cross_entropy_loss + mse_loss 
tot_loss.backward()

Is this ok? Or is it recommended to normalize the floating values of the losses to ensure both losses have equal backprop gradient importance/values?