Model training with automatic mixed precision is not learning

mcarilli · April 7, 2020, 11:06pm

Also, what does “a few iterations” mean here? It’s expected that scaler.step(optimizer) may skip the first few steps due to inf/nan gradients as the scale value calibrates, so the loss would not decrease for those iterations.