Effects of switching between amp and non-amp on training and inference

iordanis · January 26, 2021, 6:21pm

Is there any documentation or special considerations when switching between amp to non-amp and vice versa for both training and inference.

The documentation states the following, but it is still not clear the effects it would have on the model.

" If a checkpoint was created from a run without Amp, and you want to resume training with Amp, load model and optimizer states from the checkpoint as usual. The checkpoint won’t contain a saved scaler state, so use a fresh instance of GradScaler.

If a checkpoint was created from a run with Amp and you want to resume training without Amp, load model and optimizer states from the checkpoint as usual, and ignore the saved scaler state."