Debugging nan gradients: what am I doing wrong?

I have not found the exact cause of the nans, but the question about debugging I have figured it out.

To see the nans printed, I should have registered the hooks scale.register_hook(print) without setting torch.autograd.set_detect_anomaly(True). Otherwise, the anomaly detection would stop the program before the backward hook is called.