Help for DQN (implementation of the paper)

Sorry, I know it’s really bad to ask help for checking if the implementation is correct.

But I am rather new in deep RL, I don’t know if there is any debugging trick or method systematically for checking if my implementation is right.

Unlike most supervised learning algorithm, by inspecting loss, we might have some idea about how to tune hyper-parameters (as in cs231n). But we can only see mean reward in RL(as far as I know) and I don’t know how to tune based on that.

I create this post because recently I rewrite berkerly-rl-hw3 (tensorflow) to pytorch-dqn (pytorch). The previous one already create preprocessing… and other trick used in the paper. I only adapt it to be consistent with pytorch and rewrite the dqn training algorithm in pytorch.

But the performance was quite different/disappointing according to hw3 Section3 using the same hyper-parameters. I also tried the same setting with the paper, but still quite different. I never reached mean reward greater than 0.

I tried inspecting the gradients and data before and after the part which I suspected there may contains bugs.

I checked pytorch dqn tutorial already. I think I basically do the same as it does except I use target network as describe in paper.

I hope someone who is very experience in RL or pytorch can give me some suggestion or check my code( just look at the algorithm part, at ), even collaborate with me to reimplement this paper in pytorch.

1 Like

I got similar results with deepmind paper already.