"If your network has multiple losses, you must call scaler.scale on each of them individually"

divinho · April 7, 2022, 12:45pm

This is talking about the scenario when one has multiple tensors that one is calling backwards on correct?

Not the scenario where one adds the different losses together into one total_loss and calls backwards on that right?

ptrblck · April 7, 2022, 11:27pm

Yes, your understanding is correct. You don’t need to use separate scalers if you are accumulating both losses first.