`optimizer.step()` before `lr_scheduler.step()` error using GradScaler

gil_fernandes · August 18, 2020, 8:22am

@ptrblck Just checked the documentation of the GradScaler class and found this:

The scale factor often causes infs/NaNs to appear in gradients for the first few iterations as its value calibrates. scaler.step will skip the underlying optimizer.step() for these iterations. After that, step skipping should occur rarely (once every few hundred or thousand iterations).

Could this be the cause for such warnings?

And another question: do you get the scale factor using scaler.get_scale() where scaler is an instance of torch.cuda.amp.GradScaler?