Pow return 'nan' instead of expected value for second and higher order deriviative orders

JosueCom · December 19, 2024, 10:48pm

When defining a polynomial y

n = 3
t = th.zeros(10, 1).requires_grad_() 
poly = th.rand(n)
orders = th.arange(n)
y = th.sum(poly*(t**orders), dim=1, keepdim=True)

and calculating gradients the following way

y2 = th.autograd.grad(y.sum(), t, create_graph=True)[0]
y3 = th.autograd.grad(y2.sum(), t, create_graph=True)[0]

, any gradient y3 or higher (no matter what n is) results in nan when t = zeros but the right value when t is any other way. For t = zeros, y and y2 are correct which are poly[0] and poly[1].

Does anyone know what is wrong or a work around?

soulitzer · December 20, 2024, 1:25am

This is kind of a known issue unfortunately due to the way autograd handles exponents + masked semantics of gradients third-order gradient of torch.pow with tensor args and certain input returns NaN · Issue #89757 · pytorch/pytorch · GitHub.

JosueCom · December 22, 2024, 11:06am

Any expected solution to it in the future? Do you know?

soulitzer · December 23, 2024, 12:57am

I don’t think anyone is actively working it unfortunately