In PyTorch 2.7 + CUDA 12.8, using AMP causes scaler.step() to throw an error saying that no inf checks were recorded. Why does this happen?

In PyTorch 2.7 + CUDA 12.8, using AMP causes scaler.step() to throw an error saying that no inf checks were recorded. Why does this happen? How Can I solver it.

With the exact same code, everything works fine on older versions of PyTorch and CUDA.
However, after switching to an RTX 5090 and upgrading to PyTorch 2.7 + CUDA 12.8, I found that:

self.scaler = torch.amp.GradScaler(enabled=self.optim_conf.amp.enabled)

self.scaler.step(optim.optimizer)

now throws the following error:

[rank0]: AssertionError: No inf checks were recorded for this optimizer.

Could you post a minimal and executable code snippet reproducing the issue in the latest release?