Question about batch not coherent

bengao · June 18, 2024, 10:58am

Hi!
I’m using a standard autonomous car program.
Something is strange and I can’t figure out why:
I’m creating a memory that consists of this information:

self.memory.push((self.last_state, new_state, torch.LongTensor([(self.last_action)]), torch.Tensor([self.last_reward])))

But when I retrieve my memory with the following instruction

batch_state, batch_next_state, batch_action, batch_reward = self.memory.sample(10)

I don’t get the same data for batch_action and batch_reward

For example : I have in my memory the last reward all equal to “-1” and after in my batch i have for examples [-1,-1,0,-1,0,-1,-1,-1,-1,-1]

I don’t understand, if you have any ideas.

Thanks a lot

Ben

bengao · June 18, 2024, 11:11am

I found, it the memory replay , of course !!

Thanks