Do I need to save the state_dict of torch.cuda.amp.GradScaler and reload it to resume training? The docs say it dynamically estimates the scale factor each iteration, so I never saved it. So, will model.load_state_dict and optimizer.load_state_dict suffice?
If you want to restore the last scale factor (as well as the backoff and growth factor, if changed), then you should restore its
Your training should also work without restoring the gradient scaler, but will most likely not reproduce the same results as a run without interruptions, as the new gradient scaler could skip iterations at different steps.