Constant memory leak

Hi Guys,

I am experiencing memory leak in my PongDeterministicV4 PPO experiments. I can recreate the constant memory increasing issue with the half-way saved model easily, but I could not be able to locate the issue. With brand new start, the memory start to increase significantly after 100 episodes. I tried pympler tool and it looked to me that the memory used by other objects are ok.

If you run the training with half-way saved model, you will be able to get 10GB memory usage within 15 episodes.

The repository is here -
I will paste the screenshots next. I spent a week trying to locate the issue, but have not got any luck yet. Thanks in advance.


Just tried to train freshly and count different object increment. I can see the number of tensors increased significantly over time. The most use of tensors is in - It is a simple file, only containing a few lines of code. I could not see any suspicious mis-use of tensors.

I’m not familiar with the code and I’m not sure, which method is called how often, but in the file you’ve linked there is a function which adds tensors to lists: line of code.

Could you check somehow, if and when the clear function is called?
If the tensors are added to the lists, your memory will grow continuously.

It’s usually a good idea to post code directly, as this makes the search easier in the forum in case someone else has a similar issue. You can post code using three backticks ` :wink:

Thanks @ptrblck. I think I got it resolved. After several weeks battle, finally got it resolved.:v:

Good to hear!
What was the issue? Could you post a short description so that others won’t run into the same problems?

@ptrblck, It was my mistake. I cut episode by t_max=1000. After some time, as the agent is getting more intelligent, they all don’t get done after 1000 steps. see the code comment on ‘if np.any(dones): // no one has dones after some time.’. So it will not enter learning step after time, and self.parallel_trajectory keeps accumulating.

def step(self, i_episode, states, actions, action_probs, rewards, next_states, dones):

        if np.any(dones): **// no one has dones after some time.**
            states, actions, action_probs, rewards, next_states, dones = self.parallel_trajectory.numpy()
            returns = self.parallel_trajectory.discounted_returns(
            states_tensor, actions_tensor, action_probs_tensor, returns_tensor, next_states_tensor = self.to_tensor(
            del self.parallel_trajectory
self.parallel_trajectory = ParallelTrajectory(n=self.num_parallels)