Debugging gpu memory leaks might be tricky with debugger

While debugging a program with a memory leak I discovered that the leak was bigger when I was using pycharm debugger. I haven’t compared this to other debuggers but there was a definite much larger gpu memory consumption.

I tried a whole bunch of debugger settings, including “on Demand” but none seem to make a difference.

(note: This post has been edited to add this clarification - as I originally blatantly blamed the pycharm debugger, but as @googlebot pointed out in his comments it could be just as well the case with any other debugger. )

the original post follows:

Is anybody using pycharm debugger with pytorch programs? it leaks gpu memory like there is no tomorrow. This sounds similar to the problem with the backtrace on OOM in ipython not freeing gpu memory.

Unknowingly I made the mistake of actually trying to debug memory leakage using pycharm, not realizing that pycharm debugger itself is storing tensors and not freeing them, even with forced gc.collect(). I suppose this is by design since pycharm stores all the variables for the user to access and thus they can’t be freed until the frames are exited.

I discovered that while writing a script to reproduce a memory leak, and only when I added a gpu memory tracing in it using I noticed that that I was getting totally different measurements when running the same script under debugger and not.

I tried a whole bunch of settings, but none seem to help. If you have a way to tell pycharm to not store intermediary vars please share, but somehow I doubt it’s even possible.

So if you’re trying to debug a pytorch program under pycharm and you end up getting OOM, this is why.

  1. Use “Variable loading policy” = on demand.
  2. IPython’s history may keep tensors alive (underscore variables like _, _10). In practice, I almost never have issues caused by that, but maybe your console usage patterns are different. So, disabling IPython may do something (I don’t know how to disable history only).

If you don’t pause or use breakpoints, I don’t see how pycharm would allocate cuda memory.

Thank you for your follow up, @googlebot

Use “Variable loading policy” = on demand.

As I mentioned I tried many different options, including this one - to no avail.

Python’s history may keep tensors alive (underscore variables like _, _10)

No, this has to do with ipython not releasing GPU ram on OOM - a huge problem for jupyter users. A fix has been proposed almost 2 years ago, but it has never been integrated:

I use 1/0 cell-fix following the oom cell to work around it.

If you don’t pause or use breakpoints, I don’t see how pycharm would allocate cuda memory.

Right, basically you’re saying do not use pycharm debugger.

It’s not allocating cuda memory - it prevents variables from being freed and gc.collect()ed and thus memory from being freed.

Your message sounded like the debugger is somehow defective, which is not the case in my experience. IPython/jupyter’s leaks would happen regardless of pycharm or other debugger IDE, won’t they?

1 Like

You’re correct that I made a broad statement, without comparing to other debuggers. I appreciate you flagging that, @googlebot. I edited the first post to reflect that.

While there is definitely a similarity I don’t think we can compare this to jupyter/ipython. In ipython you can control what variables remain in scope, whereas debuggers do their own tracking that you can’t control.

Unfortunately, I haven’t saved that particular code that was showing drastically different gpu memory usage w/ and w/o debugger. I tried a few approaches now and the only correlation I found is the number of breakpoints in frames that had huge vars on cuda - more extra memory used when more breakpoints.

I may get a chance to investigate it more and if I do I will report back. But I’m surely going to be wary of watching memory usage under debugger and will have to check the usage patterns outside of debugger.