I was going through this tutorial and notice the following code:
# Discount future rewards back to the present using gamma for r in range(len(policy.episode_rewards)): R = r + gamma * R rewards.insert(0, R) pdb.set_trace() # Scale rewards rewards = torch.FloatTensor(rewards) rewards = (rewards - rewards.mean()) / \ (rewards.std() + np.finfo(np.float32).eps) # Calculate loss loss = (torch.sum(torch.mul(policy.episode_actions, rewards).mul(-1), -1))
in the past I’ve had many issues and unexpected errors when I’ve tried implementing losses myself. Thus, I was very skeptical of this.
What is the real Pytorch way to do this?