Hello,
I am just wondering whether I am missunderstanding Automatic Mixed Precision examples — PyTorch 1.10.0 documentation or whether the use case there is useless.
GradScaler has a single scale
parameter. Instead of scaling two times like in the example, I could just add the losses and then scale.
However, the use case that matters in my case is that the two losses could be very different in magnitude and thus, different scaling parameters could make sense. Should I just use two GradScalers then?
Something like:
grad_scaler_1 = GradScaler(enabled=True)
grad_scaler_2 = GradScaler(enabled=True)
loss_1_scaled = grad_scaler_1.scale(loss_1)
loss_2_scaled = grad_scaler_2.scale(loss_2)
loss = loss_1_scaled + loss_2_scaled
loss.backward()
...
Best,
Tim