Hi.
I try to convert valueDICE code(tf2) to pytorch.
but I got stuck in backward.
valueDICE uses the same loss function twice pi and nu.
So I tried similarly but I’m not sure how to do it properly in pytorch.
I tried like
# assume : a = pi, b = nu
self.a_optimizer(self.a.parameters())
self.b_optimizer(self.b.parameters())
# ~~~
loss = 1234 # maybe valueDICE loss
a_loss = -loss # + pi regularization
b_loss = loss # + nu penelty
tm.a_optimizer.zero_grad()
a_loss.backward(retain_graph=True)
tm.b_optimizer.zero_grad()
b_loss.backward()
tm.a_optimizer.step()
tm.b_optimizer.step()
In this case, a does not change.
I’ve tried other methods as well, but I’m not sure how to suit it.
Could you give me a keyword on how pytorch handles these cases properly?
umm… that case, loss maybe 0. and Backpropagation doesn’t seem to happen.
The loss being 0 does not imply that gradients are 0
In particular, if you have final_loss = a_loss - b_loss, the value backproped to each partial loss will be respectively 1 and -1.
Thank you for answer. But we seem to be talking different things.
# Part of valueDice code
loss = (non_linear_loss - linear_loss)
# maybe loss.backward() ? I think this is not a problem.
nu_loss = loss + nu_grad_penalty * nu_reg
pi_loss = -loss + keras_utils.orthogonal_regularization(self.actor.trunk)
nu_grads = tape.gradient(nu_loss, self.nu_net.variables)
pi_grads = tape.gradient(pi_loss, self.actor.variables)
# or (nu_loss + pi_loss).backward() ?
self.nu_optimizer.apply_gradients(zip(nu_grads, self.nu_net.variables))
self.actor_optimizer.apply_gradients(zip(pi_grads, self.actor.variables))
In this example, it doesn’t seem like it’s possible to simply add and backward in pytorch.
The main purpose is not for a(nu) and b(pi), but we have to find the slopes for different parameters in different directions for the loss(non_linear_loss - linear_loss).
Ho, I think I misunderstood that the two are actually share the parameters but you want each loss to only participate to the gradient of a subset of weights.
In that case, you want to do two backwards indeed.
If you’re using the nightly build, you can either just get the grad with autograd.grad. Or if you’re using nightly pytorch, you can specify to .backward() which inputs you want the gradients to be computed for.
# Set retain_grad in the first one because your call to backwards on the same graph
nu_grads = autograd.grad(nu_loss, self.nu_net.variables, retain_graph=True)
pi_grads = autograd.grad(pi_loss, self.actor.variables)
Since receiving the your answer, I have been googled quite a bit, but there is not much content. um…
Is there any function that supports apply gradient? like apply_gradients()
a_grads = autograd.grad(a_loss, tm.a.parameters(), retain_graph=True)
for layer, p in enumerate(tm.a.parameters()):
p.grad = torch.tensor(a_grads[layer])
b_grads = autograd.grad(b_loss, tm.b.parameters())
for layer, p in enumerate(tm.b.parameters()):
p.grad = torch.tensor(a_grads[layer])
tm.a_optimizer.step()
tm.b_optimizer.step()
anyway I confirmed that this seems to be learning. Thank you for your answers so far.