RAM Issues during training

I’m experiencing severe memory leaks in my DDPG algorithm found here.

Even after the first couple episodes, my RAM usage goes up by almost a whole gb.

I’ve tried shutting off different parts of my code such as putting no_grad() wherever possible, limiting my replay buffer to only 500 entries which will definitely not reach the memory usages I am seeing, and just not training the agent at all and only doing inference.

I’ve also tried using other versions of torch such as 1.3 but the result is the same.

Please help, any direction is appreciated. I understand that this is not a debugging session, but I appreciate any help at all :slight_smile:

A quick glance at the code does not reveal anything.
Given how your replay buffer is implemented, it is expected to grow for a bit until it is full.

Could you be clearer about what you tried and the effect?
Did you mean that when you don’t train, you don’t see this behavior?

Thanks for the very swift reply.

What I’m trying to say is that my replay bufffer has no chance to accumulate to a size that it will even become a problem before my computer runs out of memory. How I determined that my replay buffer is not the problem is by limiting its capacity to only a few hundred episodes, since I’m only storing (96,96) 2D arrays in it, a few hundred of those can’t have caused the memory problem.

When I don’t train, I still continue to see this behaviour. By not training I mean that I comment out all parts of the code that attempts to do any form of backprop or loss estimation, so the only time the network is being invoked is when I do the forward pass.

  • And if you replace the forward pass by a just returning a random output?
  • And if you replace all the calls to the gym simulator to just return a random value?

Hello alban,

Thanks for helping me with my problem! The problem went away with a clean install of PyTorch and my cuda/cudnn dependencies!

1 Like

Awesome ! Thats a simple enough solution !