Let `w` and `phi` be two parameters

``````w = Parameter(T.tensor([2.2]))
phi = Parameter(T.tensor([1.5]))
wp = w*phi
wp.backward()
print(grd)
``````

Printed:

``````tensor([2.2000])
``````

I want:

``````tensor([2.2000], requires_grad=True)
``````

i.e. I want `phi.grad` as `w` which is a parameter of larger network, should have `requires_grad=True` so I can do

``````grd.backward()
``````

I don’t know how to seperate these two computaion graph.

If I’m understanding correctly, you want to compute the second-order gradients wrt w. So you’d like to have `phi.grad` itself to have an autograd graph accumulating into `w` (and thus have `requires_grad=True`).

You could do this by doing `wp.backward(create_graph=True)`.

I actually want gradient of `wp` only wrt `phi`, so this worked for me,

``````w = Parameter(T.tensor([2.2]))
phi = Parameter(T.tensor([1.5]))
wp = w*phi
print(grd)
grd.backward()
``````

output:

``````tensor([2.2000], grad_fn=<MulBackward0>)
tensor([1.])
``````

Using modified last method

``````w = Parameter(T.tensor([2.2]))
phi = Parameter(T.tensor([1.5]))
wp = w*phi
wp.backward(create_graph=True)
print(grd)
grd.backward()
``````tensor([2.2000], grad_fn=<CopyBackwards>)
Also I found a Quote, which suggest against using `.grad` in such cases.
I believe the quote is saying that `.backward()` is hard to reason about (not `.grad()`). `.grad()` is actually the preferred alternative because (by default) it’s more explicit about what inputs its computing gradients for. Its also returns the gradient instead of performing a side effect like updating `.grad`.