I see nan gradients in my model parameters.
I set torch.autograd.set_detect_anomaly(True)
and it points to that Function 'DivBackward0' returned nan values in its 1th output
on this line
div = x / scale
So I try to print the nan gradient by doing
x.requires_grad_()
print(f"x: {x.isnan().any().item(), x.isinf().any().item()}", flush=True)
x.register_hook(lambda grad: print(f"x: {grad.isnan().any().item(), grad.isinf().any().item()}", flush=True))
scale.requires_grad_()
print(f"scale: {scale.isnan().any().item(), scale.isinf().any().item()}", flush=True)
scale.register_hook(lambda grad: print(f"scale: {grad.isnan().any().item(), grad.isinf().any().item()}", flush=True))
div = x / scale
div.requires_grad_()
print(f"div: {div.isnan().any().item(), div.isinf().any().item()}", flush=True)
div.register_hook(lambda grad: print(f"div: {grad.isnan().any().item(), grad.isinf().any().item()}", flush=True))
But all I got was False
. What am I doing wrong?