"If your network has multiple losses, you must call scaler.scale on each of them individually"

This is from the documentation on amp: Automatic Mixed Precision examples β€” PyTorch 1.11.0 documentation

This is talking about the scenario when one has multiple tensors that one is calling backwards on correct?

Not the scenario where one adds the different losses together into one total_loss and calls backwards on that right?

Yes, your understanding is correct. You don’t need to use separate scalers if you are accumulating both losses first.

1 Like