I was going through this tutorial and notice the following code:
# Discount future rewards back to the present using gamma
for r in range(len(policy.episode_rewards)):
R = r + gamma * R
rewards.insert(0, R)
pdb.set_trace()
# Scale rewards
rewards = torch.FloatTensor(rewards)
rewards = (rewards - rewards.mean()) / \
(rewards.std() + np.finfo(np.float32).eps)
# Calculate loss
loss = (torch.sum(torch.mul(policy.episode_actions, rewards).mul(-1), -1))
in the past I’ve had many issues and unexpected errors when I’ve tried implementing losses myself. Thus, I was very skeptical of this.
What is the real Pytorch way to do this?