Hi,
I’m modifying the DQN tutorial code for another task.
At the end of an episode, I need to be able to adjust all the rewards for the episode in the ReplayMemory while keeping the other elements of the namedtuple as they are.
For example, if I have the following code:
from collections import namedtuple
import torch
Transition = namedtuple('Transition',
('state', 'action', 'next_state', 'reward'))
reward1 = torch.tensor([2.0], dtype=torch.float32)
reward2 = torch.tensor([4.0], dtype=torch.float32)
memory = []
memory.append(None)
memory[0] = Transition('275', '54000', '0.0', reward1)
memory.append(None)
memory[1] = Transition('275', '54000', '0.0', reward2)
memory
I create the following memory:
[Transition(state='275', action='54000', next_state='0.0', reward=tensor([2.])),
Transition(state='275', action='54000', next_state='0.0', reward=tensor([4.]))]
Then I create my batch:
batch = Transition(*zip(*memory))
batch
which looks like this:
Transition(state=('275', '275'), action=('54000', '54000'), next_state=('0.0', '0.0'), reward=(tensor([2.]), tensor([4.])))
And from here, I can extract just the reward batch
reward_batch = torch.cat(batch.reward)
reward_batch
which looks like this:
tensor([2., 4.])
I can now adjust the rewards, e.g. like this:
adjustment = 5.0
reward_batch += adjustment
reward_batch
>>>tensor([7., 9.])
So, my questions is: how do I re-insert the adjusted reward_batch back into the namedtuple of the ReplayMemory so that it will look like this:
[Transition(state='275', action='54000', next_state='0.0', reward=tensor([7.])),
Transition(state='275', action='54000', next_state='0.0', reward=tensor([9.]))]
Thank you in advance for any help.