Model training with automatic mixed precision is not learning

Also, what does “a few iterations” mean here? It’s expected that scaler.step(optimizer) may skip the first few steps due to inf/nan gradients as the scale value calibrates, so the loss would not decrease for those iterations.

1 Like