CUDA memory leak when following the 'Play Mario with RL' tutorial

I’m trying to run the ‘Play Mario with RL’ code in a Jupyter notebook right now. The cache function had to be modified due to an error thrown when using the original code verbatim. CPU usage has no problems reaching episodes 1000+. However, running this code on my local GPU always causes the GPU’s memory to run out around episode 560. Commenting out the cache function seems to stop the memory from filling up, so I’m guessing the problem may be in there. Does anything seem wrong at a quick glance? Thanks :slight_smile:

class Mario:
    def __init__(self, state_dim, action_dim, save_dir, checkpoint=None):
        self.state_dim = state_dim
        self.action_dim = action_dim
        self.memory = deque(maxlen=100000)
        self.batch_size = 64

        self.exploration_rate = 1
        self.exploration_rate_decay = 0.99999975
        self.exploration_rate_min = 0.1
        self.gamma = 0.9

        self.curr_step = 0
        self.burnin = 1e5  # min. experiences before training
        self.learn_every = 3   # no. of experiences between updates to Q_online
        self.sync_every = 1e4   # no. of experiences between Q_target & Q_online sync

        self.save_every = 5e5   # no. of experiences between saving NN
        self.save_dir = save_dir

        # NN to predict the most optimal action - implemented in the Learn section
        self.net = DQN(self.state_dim, self.action_dim).float()
        self.net = self.net.to(device)

        if checkpoint:
            self.load(checkpoint)

        self.optimizer = torch.optim.Adam(self.net.parameters(), lr=0.00025)
        self.loss_fn = torch.nn.SmoothL1Loss()

... [CODE OMITTED] ...


    def cache(self, state, next_state, action, reward, done):
        """
        Store the experience to self.memory (replay buffer)
        Inputs:
        state (LazyFrame),
        next_state (LazyFrame),
        action (int),
        reward (float),
        done(bool))
        """
        
        state = torch.FloatTensor(np.array(state)).to(device)
        next_state = torch.FloatTensor(np.array(next_state)).to(device)
        action = torch.LongTensor([action]).to(device)
        reward = torch.DoubleTensor([reward]).to(device)
        done = torch.BoolTensor([done]).to(device)
        
        self.memory.append( (state, next_state, action, reward, done,) )

... [CODE OMITTED] ...

I cannot see any obvious issues in the posted code snippet. However, it also doesn’t show the overall usage. In the cache method you are appending device tensors to the dequeu and see an increase in memory. Are you removing these objects properly from self.memory and are not storing any other references to these tensors?