Is this the right way to compute gradients of two losses from two different NN's?

Murtaza_Basu · May 30, 2020, 3:10pm

I have a NN defined in pytorch and I have created two instances of that net as self.actor_critic_r1 and self.actor_critic_r2. I calculate the losses of each net i.e. loss1 and loss2 and I sum it up and calculate the grads in the following way,

#approach1
    loss_r1 = value_loss_r1 + action_loss_r1 - dist_entropy_r1 * args.entropy_coef
    loss_r2 = value_loss_r2 + action_loss_r2 - dist_entropy_r2 * args.entropy_coef
    self.optimizer_r1.zero_grad()
    self.optimizer_r2.zero_grad()
    loss = loss_r1 + loss_r2
    loss.backward()
    self.optimizer_r1.step()
    self.optimizer_r2.step()
    clip_grad_norm_(self.actor_critic_r1.parameters(), args.max_grad_norm)
    clip_grad_norm_(self.actor_critic_r2.parameters(), args.max_grad_norm)

Alternatively, should I update the loss individually like this,

#approach2
self.optimizer_r1.zero_grad()
(value_loss_r1 + action_loss_r1 - dist_entropy_r1 * args.entropy_coef).backward()
self.optimizer_r1.step()
clip_grad_norm_(self.actor_critic_r1.parameters(), args.max_grad_norm)
self.optimizer_r2.zero_grad()
(value_loss_r2 + action_loss_r2 - dist_entropy_r2 * args.entropy_coef).backward()
self.optimizer_r2.step()
clip_grad_norm_(self.actor_critic_r2.parameters(), args.max_grad_norm)

I am not sure if this the right approach to update a network with multiple loss please provide your suggestion.

ptrblck · May 31, 2020, 8:02am

I don’t know, how the losses were calculated, but both approaches should yield the same result, if no parameters are shared between the optimizers.
To verify it, you could seed the code, run a single iteration using both approaches, and compare the gradients and final parameters in both runs.

Also, it seems you are clipping the gradients too late, as the optimizer already performed the step method.

Murtaza_Basu · June 1, 2020, 3:59pm

Thank you for the clarification!