Hi, I’m new to reinforcement learning and trying to implement DQN as the original paper proposed.
But as the title, this DQN doesn’t seem to learn even after 1 million steps. Indeed the target value and the q value are going to close to each other, the accumulated reward (and loss) doesn’t increase.
I cannot find out what is wrong. I will be happy if you can point out what is wrong in my code. Any advice is also welcome.
Thank you for advance.