Need some help with my Double DQN implementation which plateaus long before reaching the Nature results

dominikschmidt · March 14, 2021, 5:02am

I’m trying to replicate the Mnih et al. 2015/Double DQN results on Atari Breakout but the per-episode rewards (where one episode is a single Breakout game terminating after loss of a single life) plateau after about 3-6M frames with a per-ep reward of around 5-6:

It would be really awesome if anyone could take a quick look here and check for any “obvious” problems. I tried to comment it fairly well and remove any irrelevant parts of code.

Things I have tried so far:

DDQN instead of DQN
Adam instead of RMSProp (training with Adam doesn’t even reach episode reward > 1, see gray line in plot above)
various learning rates
using exact hyperparams from the DQN, DDQN, Mnih et al 2015, 2013,… papers
fixing lots of bugs
training for more than 10M frames (most other implementations I have seen reach a reward about 10x mine after 10M frames)

My goal ist to fully implement Rainbow-DQN but I would like to get DDQN to work properly first.