I’m fixing the torch.manual_seed()
and I was expecting the two code snippets below to lead to the same results, however calling backward()
inside the loop seems to lead to a better performance. I’m struggling to understand whey there is a difference between the two, could this be a numerical problem (the sum of loss values overflows?)
backward()
inside the loop:
optimizer.zero_grad()
for loss in episode_losses:
weighted_loss = loss * reward
weighted_loss.backward()
optimizer.step()
backward()
on the sum:
optimizer.zero_grad()
total_loss = 0
for loss in episode_losses:
weighted_loss = loss * reward
total_loss += weighted_loss
total_loss.backward()
optimizer.step()