How to convert model's parameters to tensor and keep grad?

Hi all,

I’d like to compute the second derivative of the loss w.r.t. a model (nn.module), but I cannot do this:

params = torch.cat([p.flatten() for p in policy.parameters()], dim=0)

for i in range(200):
  y = get_expected_return(policy)
  grad = torch.autograd.grad(y, params, create_graph=True)[0]

  with torch.no_grad():
    params += grad

So, I need to do this:

for i in range(200):
  y = get_expected_return(policy)
  grad = torch.autograd.grad(y, policy.parameters(), create_graph=True)

  with torch.no_grad():
    for p, g in zip(policy.parameters(), grad):
      p += g

But, the first solution is required for the second derivative. So, anybody knows how to make the first code work?

Hi,

Why would the first one be required?
The second one looks fine to me.

Hi albanD,

Thanks for you answer, sorry for my late response (did not get a notification).

The second solution works, but now how do I compute the second derivative?


for i in range(200):

  y = get_expected_return(policy)
  grad = torch.autograd.grad(y, policy.parameters(), create_graph=True)

  for gvec, (name,theta) in zip(grad, policy.named_parameters()):
    gvec=gvec.flatten()
    
    H = [torch.autograd.grad(g, theta, create_graph=True)[0].unsqueeze(0) for g in gvec]

Is this correct?

Consider two parameter sets, x1 and x2. The H should be computed by:
image

In my code, where am I computing right upper and left lower elements?

Is this correct?

Yes that looks correct.

In my code, where am I computing right upper and left lower elements?

You are computing the whole thing.
You have two loops over the params creating a full size n x n matrix.

Hi Alban,

Yes, and but if there are two weight matrices Wx, Wr, with sizes RX, RR (RNN).

Therefore, n=(RX+RR)**2. So, each row is of size r=RX+RR

How can this now be used to update Wx and Wr? Or can you slice?

E.g. H[:, :RX] for Wx and H[:, :RX] for Wr?