I am experiencing memory leak in my PongDeterministicV4 PPO experiments. I can recreate the constant memory increasing issue with the half-way saved model easily, but I could not be able to locate the issue. With brand new start, the memory start to increase significantly after 100 episodes. I tried pympler tool and it looked to me that the memory used by other objects are ok.
If you run the training with half-way saved model, you will be able to get 10GB memory usage within 15 episodes.
The repository is here - https://github.com/weicheng113/PongPPO
I will paste the screenshots next. I spent a week trying to locate the issue, but have not got any luck yet. Thanks in advance.
Just tried to train freshly and count different object increment. I can see the number of tensors increased significantly over time. The most use of tensors is in ppo_agent.py - https://github.com/weicheng113/PongPPO/blob/master/ppo_agent.py. It is a simple file, only containing a few lines of code. I could not see any suspicious mis-use of tensors.
I’m not familiar with the code and I’m not sure, which method is called how often, but in the file you’ve linked there is a function which adds
tensors to lists: line of code.
Could you check somehow, if and when the
clear function is called?
If the tensors are added to the lists, your memory will grow continuously.
It’s usually a good idea to post code directly, as this makes the search easier in the forum in case someone else has a similar issue. You can post code using three backticks `
Thanks @ptrblck. I think I got it resolved. After several weeks battle, finally got it resolved.
Good to hear!
What was the issue? Could you post a short description so that others won’t run into the same problems?
@ptrblck, It was my mistake. I cut episode by t_max=1000. After some time, as the agent is getting more intelligent, they all don’t get done after 1000 steps. see the code comment on ‘if np.any(dones): // no one has dones after some time.’. So it will not enter learning step after time, and self.parallel_trajectory keeps accumulating.
def step(self, i_episode, states, actions, action_probs, rewards, next_states, dones):
if np.any(dones): **// no one has dones after some time.**
states, actions, action_probs, rewards, next_states, dones = self.parallel_trajectory.numpy()
returns = self.parallel_trajectory.discounted_returns(self.discount)
states_tensor, actions_tensor, action_probs_tensor, returns_tensor, next_states_tensor = self.to_tensor(
self.parallel_trajectory = ParallelTrajectory(n=self.num_parallels)