Why the loss_scale getting smaller and smaller?

You could unscale the gradients manually to inspect them to see which ones are overflowing. Unscaling Infs or NaNs will of course keep these invalid values but it should give you an idea where in the model the gradients start to overflow.

1 Like