Clone the grad attribute

Tengyu_MA · November 5, 2017, 12:59am

I am implementation an algorithm of the following type:

at each iteration, the algorithm computes gradients of two objectives f and g with respect to the parameters, and combine them in some way, and then use the combined result as the effective gradient for SGD or Adam… (The combination of the gradients are not linear. So that I can’t just combine the objective f and g first and take a single gradient)

Is it true that the best way for me to do it is to compute the gradient of f by

opt.zero_grad()
f.backward()

and then clone all the grad attribute of the parameters

and then do the same thing for g

and then combine them and assign the .grad attributes with the new effective grad, and call the opt.step() at the end?

Is there any better way to do this?

SimonW · November 5, 2017, 2:04am

I’m not sure why you would need to clone them. You can just calculate and reassign.

The better way probably is writinh your own optimizer that extends torch.optim.Optimizer.

ruotianluo · November 5, 2017, 3:45am

Slightly neater way.

p = list(model.parameters())
grad_f = torch.autograd.grad(f, p, retrain_graph=True)
grad_g = torch.autograd.grad(g, p)

for i in range(len(p)):
  p[i].grad = func(grad_f[i], grad_g[i])

Tengyu_MA · November 5, 2017, 4:05am

I think you would need clone because it’s likely that when I call g.backward(), the .grad attributes will be changed.

Actually I am not sure what do you mean by calculating and reassigning.

Regarding “The better way probably is writinh your own optimizer that extends torch.optim.Optimizer.”

There is no issue with extending the optimizer. The question is how to extend it. That’s exactly what I asked.

Tengyu_MA · November 5, 2017, 4:08am

Thanks!

But I think I have no idea how torch.autograd.grad does … Does it just return a list of tensors that contains the gradient of f with respect to p?

If that’s the case then this should work for me.

ruotianluo · November 5, 2017, 4:15am

It will return a list of Variables. In default they are requires_grad=False Variables so they are basically tensors.

SimonW · November 5, 2017, 4:30am

Oh I see. Yeah they will be changed. So regarding the first route, @ruotianluo 's approach is better.

Tengyu_MA · November 5, 2017, 5:30am

I see. Just to clarify, the value of grad_f won’t be changed by any other operations in the future, right? (By contrast, the .grad attribute can be potentially changed by other operations. )

So this sounds a much cleaner way to deal with this kind of issues. Thanks!

ruotianluo · November 6, 2017, 2:15am

They won’t be changed. Also if you do torch.autograd.grad, by default the gradient of the parameters will also remain None.