I am implementation an algorithm of the following type:
at each iteration, the algorithm computes gradients of two objectives f and g with respect to the parameters, and combine them in some way, and then use the combined result as the effective gradient for SGD or Adam… (The combination of the gradients are not linear. So that I can’t just combine the objective f and g first and take a single gradient)
Is it true that the best way for me to do it is to compute the gradient of f by
opt.zero_grad()
f.backward()
and then clone all the grad attribute of the parameters
and then do the same thing for g
and then combine them and assign the .grad attributes with the new effective grad, and call the opt.step() at the end?
p = list(model.parameters())
grad_f = torch.autograd.grad(f, p, retrain_graph=True)
grad_g = torch.autograd.grad(g, p)
for i in range(len(p)):
p[i].grad = func(grad_f[i], grad_g[i])
I see. Just to clarify, the value of grad_f won’t be changed by any other operations in the future, right? (By contrast, the .grad attribute can be potentially changed by other operations. )
So this sounds a much cleaner way to deal with this kind of issues. Thanks!