Should multi-task losses be normalized?

Let’s say there were two losses, and some of the losses share some parts of the architecture (i.e. a feature extractor for images).

  • Cross Entropy
  • MSE

The floating value for the Cross Entropy loss would be something low, say 0.25
The floating value for the MSE loss would be something larger, say 20.4

# Let the below variables be tensors holding the graph for calculating the loss
# cross_entropy_loss = 0.25
# mse_loss = 20.4

I’m currently doing this:

tot_loss = cross_entropy_loss + mse_loss 
tot_loss.backward()

Is this ok? Or is it recommended to normalize the floating values of the losses to ensure both losses have equal backprop gradient importance/values?