How to convert model's parameters to tensor and keep grad?

Tessa_van_der_Heiden · August 12, 2021, 7:56am

Hi all,

I’d like to compute the second derivative of the loss w.r.t. a model (nn.module), but I cannot do this:

params = torch.cat([p.flatten() for p in policy.parameters()], dim=0)

for i in range(200):
  y = get_expected_return(policy)
  grad = torch.autograd.grad(y, params, create_graph=True)[0]

  with torch.no_grad():
    params += grad

So, I need to do this:

for i in range(200):
  y = get_expected_return(policy)
  grad = torch.autograd.grad(y, policy.parameters(), create_graph=True)

  with torch.no_grad():
    for p, g in zip(policy.parameters(), grad):
      p += g

But, the first solution is required for the second derivative. So, anybody knows how to make the first code work?

albanD · August 12, 2021, 2:27pm

Hi,

Why would the first one be required?
The second one looks fine to me.

Tessa_van_der_Heiden · September 11, 2021, 4:08pm

Hi albanD,

Thanks for you answer, sorry for my late response (did not get a notification).

The second solution works, but now how do I compute the second derivative?


for i in range(200):

  y = get_expected_return(policy)
  grad = torch.autograd.grad(y, policy.parameters(), create_graph=True)

  for gvec, (name,theta) in zip(grad, policy.named_parameters()):
    gvec=gvec.flatten()
    
    H = [torch.autograd.grad(g, theta, create_graph=True)[0].unsqueeze(0) for g in gvec]

Is this correct?

Consider two parameter sets, x1 and x2. The H should be computed by:

In my code, where am I computing right upper and left lower elements?

albanD · September 13, 2021, 1:38pm

Is this correct?

Yes that looks correct.

In my code, where am I computing right upper and left lower elements?

You are computing the whole thing.
You have two loops over the params creating a full size n x n matrix.

Tessa_van_der_Heiden · September 15, 2021, 8:22pm

Hi Alban,

Yes, and but if there are two weight matrices Wx, Wr, with sizes RX, RR (RNN).

Therefore, n=(RX+RR)**2. So, each row is of size r=RX+RR

How can this now be used to update Wx and Wr? Or can you slice?

E.g. H[:, :RX] for Wx and H[:, :RX] for Wr?