Do I need to save the state_dict oof GradScaler?

Do I need to save the state_dict of torch.cuda.amp.GradScaler and reload it to resume training? The docs say it dynamically estimates the scale factor each iteration, so I never saved it. So, will model.load_state_dict and optimizer.load_state_dict suffice?

1 Like

If you want to restore the last scale factor (as well as the backoff and growth factor, if changed), then you should restore its state_dict.
Your training should also work without restoring the gradient scaler, but will most likely not reproduce the same results as a run without interruptions, as the new gradient scaler could skip iterations at different steps.