Let’s say there were two losses, and some of the losses share some parts of the architecture (i.e. a feature extractor for images).

- Cross Entropy
- MSE

The floating value for the Cross Entropy loss would be something low, say 0.25

The floating value for the MSE loss would be something larger, say 20.4

```
# Let the below variables be tensors holding the graph for calculating the loss
# cross_entropy_loss = 0.25
# mse_loss = 20.4
```

I’m currently doing this:

```
tot_loss = cross_entropy_loss + mse_loss
tot_loss.backward()
```

Is this ok? Or is it recommended to normalize the floating values of the losses to ensure both losses have equal backprop gradient importance/values?