How to adjust the rewards in the namedtuple ReplayMemory of the DQN Tutorial


#1

Hi,
I’m modifying the DQN tutorial code for another task.

At the end of an episode, I need to be able to adjust all the rewards for the episode in the ReplayMemory while keeping the other elements of the namedtuple as they are.

For example, if I have the following code:

from collections import namedtuple
import torch

Transition = namedtuple('Transition',
                        ('state', 'action', 'next_state', 'reward'))
reward1 = torch.tensor([2.0], dtype=torch.float32)
reward2 = torch.tensor([4.0], dtype=torch.float32)
memory = []
memory.append(None)
memory[0] = Transition('275', '54000', '0.0', reward1)
memory.append(None)
memory[1] = Transition('275', '54000', '0.0', reward2)
memory

I create the following memory:

[Transition(state='275', action='54000', next_state='0.0', reward=tensor([2.])),
 Transition(state='275', action='54000', next_state='0.0', reward=tensor([4.]))]

Then I create my batch:

batch = Transition(*zip(*memory))
batch

which looks like this:

Transition(state=('275', '275'), action=('54000', '54000'), next_state=('0.0', '0.0'), reward=(tensor([2.]), tensor([4.])))

And from here, I can extract just the reward batch

reward_batch = torch.cat(batch.reward)
reward_batch

which looks like this:

tensor([2., 4.])

I can now adjust the rewards, e.g. like this:

adjustment = 5.0
reward_batch += adjustment
reward_batch
>>>tensor([7., 9.])

So, my questions is: how do I re-insert the adjusted reward_batch back into the namedtuple of the ReplayMemory so that it will look like this:

[Transition(state='275', action='54000', next_state='0.0', reward=tensor([7.])),
 Transition(state='275', action='54000', next_state='0.0', reward=tensor([9.]))]

Thank you in advance for any help.


#2

The answer is to use recordtype instead of namedtuple! See here:

https://stackoverflow.com/questions/31252939/changing-values-of-a-list-of-namedtuples