Memory leak persisting after closing python?

I just had a memory leak that persisted with 90% of the GPU memory occupied AFTER I closed python and did killall python + killall jupyter for good measure. I also checked nvidia-smi and saw the 90% GPU memory usage but no corresponding processes actually using it…

How are things like this possible? How do I prevent them? And how do I diagnose them?
The only recourse I had that I knew of at the time was to reboot. But I really need some context to this problem…

P.S. I just got this to happen again by interrupting a jupyter notebook cell during the training process. So that may be related. But still I’m not sure how to deal with these issues in the long run.

Interrupting a process is often the root cause of a failed exit behavior. In my experience I haven’t seen dead processes often when working directly in the terminal (unless the code crashes) and have seen it more often in Jupyter (but I’m also not a heavy Jupyter user so others might have a different experience).

I see so it’s not a given in general that a process needs to be open to cause a memory leak?
Is there some way to deal with these after they’ve occurred aside from rebooting?

Also out of curiosity what is your preference over Jupyter?

I wouldn’t claim it’s a memory leak, but think it’s a dead process which still holds the memory.

Finding the dead process via ps aux and killing it.

I’m not a researcher and am more focused on the backend as well as debugging, so I wouldn’t necessarily stick to my setup, but:

  • plain vi for remote work
  • Spyder for quick “frontend” work (coming from the signal processing domain with MATLAB, Spyder felt familiar but I think it’s not hugely popular)
  • VS Code (+ remote access if needed)
1 Like

Thanks for the reply! Interesting choice in IDEs, I’ve been leaning that way too.
But regarding killing the process I already tried killall python & killall jupyter do you think it is some other process then?

I’m not sure what the exact naming would be, so I would recommend to check all processes in case it’s jupyter-kernel or so.

1 Like

Thanks I’ll try that next time!

Try passing the -9 flag to killall, so that it sends the stronger SIGKILL which cannot be blocked. E.g. killall -9 jupyter.

Killing all python processes may not be advisable, since you may end up disrupting non-ML programs. The better idea is to find the PIDs of your ML python programs using ps -u $USER and killing only the right ones using kill -9 <PID of the python process that you want to kill>.

Idc, if I’m allowed to kill processes which should not be killed then that’s the Admin’s fault. Kill all is simpler, and prevents the possibility of unseen zombie processes.

Thanks for the tip about -9 flag though, its weird since it’s called killall I thought it would do SIGKILL by default? Good to know.