I’m trying to run the ‘Play Mario with RL’ code in a Jupyter notebook right now. The cache function had to be modified due to an error thrown when using the original code verbatim. CPU usage has no problems reaching episodes 1000+. However, running this code on my local GPU always causes the GPU’s memory to run out around episode 560. Commenting out the cache
function seems to stop the memory from filling up, so I’m guessing the problem may be in there. Does anything seem wrong at a quick glance? Thanks
class Mario:
def __init__(self, state_dim, action_dim, save_dir, checkpoint=None):
self.state_dim = state_dim
self.action_dim = action_dim
self.memory = deque(maxlen=100000)
self.batch_size = 64
self.exploration_rate = 1
self.exploration_rate_decay = 0.99999975
self.exploration_rate_min = 0.1
self.gamma = 0.9
self.curr_step = 0
self.burnin = 1e5 # min. experiences before training
self.learn_every = 3 # no. of experiences between updates to Q_online
self.sync_every = 1e4 # no. of experiences between Q_target & Q_online sync
self.save_every = 5e5 # no. of experiences between saving NN
self.save_dir = save_dir
# NN to predict the most optimal action - implemented in the Learn section
self.net = DQN(self.state_dim, self.action_dim).float()
self.net = self.net.to(device)
if checkpoint:
self.load(checkpoint)
self.optimizer = torch.optim.Adam(self.net.parameters(), lr=0.00025)
self.loss_fn = torch.nn.SmoothL1Loss()
... [CODE OMITTED] ...
def cache(self, state, next_state, action, reward, done):
"""
Store the experience to self.memory (replay buffer)
Inputs:
state (LazyFrame),
next_state (LazyFrame),
action (int),
reward (float),
done(bool))
"""
state = torch.FloatTensor(np.array(state)).to(device)
next_state = torch.FloatTensor(np.array(next_state)).to(device)
action = torch.LongTensor([action]).to(device)
reward = torch.DoubleTensor([reward]).to(device)
done = torch.BoolTensor([done]).to(device)
self.memory.append( (state, next_state, action, reward, done,) )
... [CODE OMITTED] ...