Hi!
I’m using a standard autonomous car program.
Something is strange and I can’t figure out why:
I’m creating a memory that consists of this information:
self.memory.push((self.last_state, new_state, torch.LongTensor([(self.last_action)]), torch.Tensor([self.last_reward])))
But when I retrieve my memory with the following instruction
batch_state, batch_next_state, batch_action, batch_reward = self.memory.sample(10)
I don’t get the same data for batch_action and batch_reward
For example : I have in my memory the last reward all equal to “-1” and after in my batch i have for examples [-1,-1,0,-1,0,-1,-1,-1,-1,-1]
I don’t understand, if you have any ideas.
Thanks a lot
Ben