Backprop in a network with two heads

I am currently trying to understand the actor critic example for the cart-pole environment.

I understand the general principle and how the algorithm works. However in this code we have a neural net with two heads. One output for our actions and another one for our predicted future reward based on the current state.

My question is about these few lines of code:

# reset gradients
    optimizer.zero_grad()

    # sum up all the values of policy_losses and value_losses
    loss = torch.stack(policy_losses).sum() + torch.stack(value_losses).sum()

    # perform backprop
    loss.backward()
    optimizer.step()

    # reset rewards and action buffer
    del model.rewards[:]
    del model.saved_actions[:]

Here we are adding the policy loss and the value loss. But as i understood it, we should call backward once for the loss of the policy and once for the value loss? Why is it sufficient here to just add them up?

Thanks in advance :slight_smile:

1 Like

If you add both losses, gradient magnitude and direction for weights do not change when compared to doing backwards twice for both loss values. Mathematically resulting gradients will be the same in both cases thus taking derivatives once is more efficient.
From another point of view is that your network can not optimize two loss functions separately, in your case you only have one loss function with two sub loss functions which is policy_losses+value_losses.

Hope this helps.
Best.

Yes, your explanation makes absolutely sense.

Thank you very much