My DQN agent is not learning

PvPZT_ETRON · November 28, 2022, 8:50am

Hi ! I tried to implement my first DQN agent for gym Cartpole, but it doesn’t seem to learn : the score at the end is worse than random play
I tried some things :

changing some parameters : learning rate, parameters for epsilon greedy, discount rate
changing the network architecture by making it much bigger
removing the target network
Those don’t seem to work and I am very confused regarding what I’m doing wrong
Thanks in advance for your help !

github.com

baptisteeeeeeee/Cartpolerl/blob/main/cartpoleRL.py

import gym
import math, random as rd, numpy as np, copy, matplotlib.pyplot as plt
import torch as T
import torch.nn as nn
import torch.functional as F
import torch.optim as omptim


env = gym.make('CartPole-v1', render_mode='human')
obs = env.reset()


gamma = 0.95
lr = 0.0001
epsilon, epmax, epmin, epdecay = 1, 1, 0.1, 0.005
N_episodes = 3000



n_input, n_hidden, n_out = 4, 5, 2

This file has been truncated. show original

J_Johnson · December 4, 2022, 5:39pm

Do you have a chart of the progress? What I’ve found is DQNs often get better up to a point and then much worse if you keep training them. So it’s good to set milestones to save.

J_Johnson · December 4, 2022, 5:50pm

Getting the correct rewards and Bellman’s target can often be a weak point and may need some tweaking. This developer was having a similar issue(albeit in Keras): DQN debugging using Open AI gym Cartpole - ADG Efficiency

So you might need to review and tweak accordingly. DQNs are an ongoing area of research.

J_Johnson · December 4, 2022, 5:54pm

Last comment, Pytorch has a tutorial with code you could give a try. It worked when I tried it at improving over time.

https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html

vmoens · December 8, 2022, 2:35pm

Minor note here:
We’re working on improving the DQN tutorial, you can check it there:

github.com/pytorch/tutorials

[WIP] Improve training of DQN tutorial

pytorch:master ← SiftingSands:DQN_revise_training

opened 03:53AM - 07 Sep 22 UTC

SiftingSands

+92 -148

Following up the discussion from https://github.com/pytorch/tutorials/pull/2026 … I still need to do multiple runs to get a semblance of the statistics of # episodes vs duration for both the original and my changes. The slight increase in model capacity still only uses ~1.5 GB of VRAM, so it should be pretty accessible and training is still relatively quick. Here's the reward history for one run of these tweaks when I was doing a bunch of trial and error (spent an embarrassing amount of time tweaking hyperparameters and rewards). ![duration_only_reward_03](https://user-images.githubusercontent.com/43226539/188784588-87a82d27-2f28-4e71-ae6e-1e4e5b3b8f72.png) @vmoens Feel free to change (or completely discard) anything based on your findings. I haven't tried tweaking anything else in the training pipeline.