Automatic Mixed Precision Sum of different losses

Hi,

I have a question regarding the mixed precision training when using a more complex loss that is the sum of individual loss terms. The mixed precision tutorial says that you have to call the scaler on each of the individual losses, however, the example also uses different models and optimizers.
If I only have a single model and want to optimize the sum of two losses, eg some additional regularization term on top of cross-entropy, is there a difference between calling the scaler on the sum vs calling the scaler on each of the individual losses?

Thank’s a lot,
Max

You could use a single GradScaler and scale the final loss as described here.