I am just wondering whether I am missunderstanding Automatic Mixed Precision examples — PyTorch 1.10.0 documentation or whether the use case there is useless.
GradScaler has a single
scale parameter. Instead of scaling two times like in the example, I could just add the losses and then scale.
However, the use case that matters in my case is that the two losses could be very different in magnitude and thus, different scaling parameters could make sense. Should I just use two GradScalers then?
grad_scaler_1 = GradScaler(enabled=True) grad_scaler_2 = GradScaler(enabled=True) loss_1_scaled = grad_scaler_1.scale(loss_1) loss_2_scaled = grad_scaler_2.scale(loss_2) loss = loss_1_scaled + loss_2_scaled loss.backward() ...