Constant memory leak

Hi Guys,

I am experiencing memory leak in my PongDeterministicV4 PPO experiments. I can recreate the constant memory increasing issue with the half-way saved model easily, but I could not be able to locate the issue. With brand new start, the memory start to increase significantly after 100 episodes. I tried pympler tool and it looked to me that the memory used by other objects are ok.

If you run the training with half-way saved model, you will be able to get 10GB memory usage within 15 episodes.

The repository is here - https://github.com/weicheng113/PongPPO
I will paste the screenshots next. I spent a week trying to locate the issue, but have not got any luck yet. Thanks in advance.

Cheng




Just tried to train freshly and count different object increment. I can see the number of tensors increased significantly over time. The most use of tensors is in ppo_agent.py - https://github.com/weicheng113/PongPPO/blob/master/ppo_agent.py. It is a simple file, only containing a few lines of code. I could not see any suspicious mis-use of tensors.

I’m not familiar with the code and I’m not sure, which method is called how often, but in the file you’ve linked there is a function which adds tensors to lists: line of code.

Could you check somehow, if and when the clear function is called?
If the tensors are added to the lists, your memory will grow continuously.

It’s usually a good idea to post code directly, as this makes the search easier in the forum in case someone else has a similar issue. You can post code using three backticks ` :wink:

Thanks @ptrblck. I think I got it resolved. After several weeks battle, finally got it resolved.:v:

Good to hear!
What was the issue? Could you post a short description so that others won’t run into the same problems?

@ptrblck, It was my mistake. I cut episode by t_max=1000. After some time, as the agent is getting more intelligent, they all don’t get done after 1000 steps. see the code comment on ‘if np.any(dones): // no one has dones after some time.’. So it will not enter learning step after time, and self.parallel_trajectory keeps accumulating.

def step(self, i_episode, states, actions, action_probs, rewards, next_states, dones):
        self.parallel_trajectory.add(
            parallel_states=states,
            parallel_actions=actions,
            parallel_action_probs=action_probs,
            parallel_rewards=rewards,
            parallel_next_states=next_states,
            parallel_dones=dones)

        if np.any(dones): **// no one has dones after some time.**
            states, actions, action_probs, rewards, next_states, dones = self.parallel_trajectory.numpy()
            returns = self.parallel_trajectory.discounted_returns(self.discount)
            states_tensor, actions_tensor, action_probs_tensor, returns_tensor, next_states_tensor = self.to_tensor(
                states=states,
                actions=actions,
                action_probs=action_probs,
                returns=returns,
                next_states=next_states)
            self.learn(
                states=states_tensor,
                actions=actions_tensor,
                action_probs=action_probs_tensor,
                returns=returns_tensor,
                next_states=next_states_tensor)
            del self.parallel_trajectory
self.parallel_trajectory = ParallelTrajectory(n=self.num_parallels)