DQN example from PyTorch diverged!

wernerchao · February 6, 2019, 9:39pm

What was the final consensus? I’ve tried most of the suggestions here with no improvements:

changed pixels to gym environment
tried mse loss
tuned learning rate, 0.001, 0.0001
changed target network update cycle, 10, 100, 1000
None of these worked, and the average duration stays around 20 timesteps.
Even with the original PyTorch implementation, at max it stays at 50 timesteps average duration.

tqjxlm · February 28, 2019, 8:53am

Cannot say it’s solved. But a working version can be found at https://github.com/tqjxlm/Simple-DQN-Pytorch.

It’s not deterministic, I restarted several times to get a good result.

The 100 episode mean reached 200 after 700 episodes and reached 500 later
It uses 3 stacked frames of rendered pixels as input
It uses prioritized replay memory, double DQN and duelling DQN
It uses MSE loss and Adam optimizer
It does not converge, and not diverge neither. The 100-mean keeps above 200 after it gets there
Other hyperparameters can be found at the Github page

Dark_isOP · October 8, 2019, 9:53pm

That is a dueling DQN, not the DQN from the Atari/nature paper.

wwiiiii · February 7, 2020, 3:36am

In my case, L1 loss required much lower target network synchronization frequency for successful training.

In CartPole-v0 environment with numerical state representation (not the image one), L2 loss works well with synchronization every 100 frames, but L1 loss needed synchronization frequency >= 5000 in unit of frame.
Although I never tried with Huber loss, I think it would have basically same characteristic as L1 loss.