My DQN doesn't learn

Hi, I’m new to reinforcement learning and trying to implement DQN as the original paper proposed.

But as the title, this DQN doesn’t seem to learn even after 1 million steps. Indeed the target value and the q value are going to close to each other, the accumulated reward (and loss) doesn’t increase.

I cannot find out what is wrong. I will be happy if you can point out what is wrong in my code. Any advice is also welcome.

Thank you for advance.

1 Like