Pytorch autograd second derivative

tingw · July 13, 2022, 8:55pm

Hi. When I was testing the following toy example

q = torch.rand(10, 2, requires_grad=True)
p = torch.rand(10, 1, requires_grad=True)
input = torch.cat((q, p), dim=1)
u = torch.sum(input, dim=1)
u_q = torch.autograd.grad(u, q, grad_outputs=torch.ones_like(u), create_graph=True, retain_graph=True)[0]
u_p = torch.autograd.grad(u, p, grad_outputs=torch.ones_like(u), create_graph=True, retain_graph=True)[0]
u_qq = torch.autograd.grad(u_q, q, grad_outputs=torch.ones_like(u_q), create_graph=True, retain_graph=True)[0]

I got the error message:

element 0 of tensors does not require grad and does not have a grad_fn

However, if I change u = torch.sum(input, dim=1) to u = torch.sum(input**1, dim=1), the error disappears. Can anyone explain why?

soulitzer · July 14, 2022, 6:46pm

Probably because u is linear wrt its inputs, so when you take the first derivative that produces a constant output, which won’t have a grad_fn. When you use input**1 that will probably just get you a gradient that is zero anyway.

tingw · July 14, 2022, 7:07pm

I agree. This seems to be what is going on here. But it is strange that one needs to add **1 in order for autograd to work properly for such a simple computation.

soulitzer · July 14, 2022, 7:12pm

I’d actually say that this is expected behavior since you’d rather explicitly know that your graph will always produce zero gradients. I agree that it can be confusing, however.