What was the final consensus? I’ve tried most of the suggestions here with no improvements:
- changed pixels to gym environment
- tried mse loss
- tuned learning rate, 0.001, 0.0001
- changed target network update cycle, 10, 100, 1000
None of these worked, and the average duration stays around 20 timesteps.
Even with the original PyTorch implementation, at max it stays at 50 timesteps average duration.