how to get gradients that still have requires_grad True

Let w and phi be two parameters

w = Parameter(T.tensor([2.2]))
phi = Parameter(T.tensor([1.5]))
wp = w*phi
wp.backward()
grd = phi.grad
print(grd)

Printed:

tensor([2.2000])

I want:

tensor([2.2000], requires_grad=True)

i.e. I want phi.grad as w which is a parameter of larger network, should have requires_grad=True so I can do

grd.backward()
w.grad

I don’t know how to seperate these two computaion graph.

If I’m understanding correctly, you want to compute the second-order gradients wrt w. So you’d like to have phi.grad itself to have an autograd graph accumulating into w (and thus have requires_grad=True).

You could do this by doing wp.backward(create_graph=True).

Thanks for your reply.

I actually want gradient of wp only wrt phi, so this worked for me,

w = Parameter(T.tensor([2.2]))
phi = Parameter(T.tensor([1.5]))
wp = w*phi
grd = T.autograd.grad(wp, phi, create_graph=True)[0]
print(grd)
grd.backward()
w.grad
print(w.grad)

output:

tensor([2.2000], grad_fn=<MulBackward0>)
tensor([1.])

Using modified last method

w = Parameter(T.tensor([2.2]))
phi = Parameter(T.tensor([1.5]))
wp = w*phi
wp.backward(create_graph=True)
grd = phi.grad
print(grd)
grd.backward()
w.grad
print(w.grad)

output:

tensor([2.2000], grad_fn=<CopyBackwards>)
tensor([2.5000], grad_fn=<CopyBackwards>)

I don’t know what is going on with last method.
Also I found a Quote, which suggest against using .grad in such cases.

I believe the quote is saying that .backward() is hard to reason about (not .grad()). .grad() is actually the preferred alternative because (by default) it’s more explicit about what inputs its computing gradients for. Its also returns the gradient instead of performing a side effect like updating .grad.