DQN example from PyTorch diverged!

What was the final consensus? I’ve tried most of the suggestions here with no improvements:

  • changed pixels to gym environment
  • tried mse loss
  • tuned learning rate, 0.001, 0.0001
  • changed target network update cycle, 10, 100, 1000
    None of these worked, and the average duration stays around 20 timesteps.
    Even with the original PyTorch implementation, at max it stays at 50 timesteps average duration.

Cannot say it’s solved. But a working version can be found at https://github.com/tqjxlm/Simple-DQN-Pytorch.

It’s not deterministic, I restarted several times to get a good result.

  • The 100 episode mean reached 200 after 700 episodes and reached 500 later
  • It uses 3 stacked frames of rendered pixels as input
  • It uses prioritized replay memory, double DQN and duelling DQN
  • It uses MSE loss and Adam optimizer
  • It does not converge, and not diverge neither. The 100-mean keeps above 200 after it gets there
  • Other hyperparameters can be found at the Github page

That is a dueling DQN, not the DQN from the Atari/nature paper.

In my case, L1 loss required much lower target network synchronization frequency for successful training.

In CartPole-v0 environment with numerical state representation (not the image one), L2 loss works well with synchronization every 100 frames, but L1 loss needed synchronization frequency >= 5000 in unit of frame.
Although I never tried with Huber loss, I think it would have basically same characteristic as L1 loss.