AMP GradScale multiple losses independently

I am just wondering whether I am missunderstanding Automatic Mixed Precision examples — PyTorch 1.10.0 documentation or whether the use case there is useless.

GradScaler has a single scale parameter. Instead of scaling two times like in the example, I could just add the losses and then scale.
However, the use case that matters in my case is that the two losses could be very different in magnitude and thus, different scaling parameters could make sense. Should I just use two GradScalers then?

Something like:

grad_scaler_1 = GradScaler(enabled=True)
grad_scaler_2 = GradScaler(enabled=True)

loss_1_scaled = grad_scaler_1.scale(loss_1)
loss_2_scaled = grad_scaler_2.scale(loss_2)

loss = loss_1_scaled + loss_2_scaled



I wouldn’t call the example useless, but yes you can use multiple GradScalers if that fits your use case better.

1 Like