How to ensure GPU memory is freed at the end of a program?

TinfoilHat0 · April 27, 2020, 11:49am

Hi,

I’m using a bash script to do grid-search for my pytorch code.
Something like as follows:

for lr in 0.1 0.01 0.001
do
    for wd in 0.1 0.01 0.001
    do
       # call pytorch script for the current setting of hyperparams
       python my_pytorch_script.py --lr=$lr --wd=$wd
    done
done

I noticed that, sometimes, a process finishes without freeing the GPU memory. It eventually causes the GPU to go OOM even though I execute each program sequentially.

How can I fix this?

donJuan · April 27, 2020, 1:11pm

maybe some gpu cache memory is not emptied? check torch.cuda.empty_cache()

ptrblck · April 28, 2020, 4:41am

Was each script execution correctly exited or did you see some errors?
If the GPU memory is not released after finishing the execution, some zombie processes might be alive.
Could you check it via htop or ps?

TinfoilHat0 · April 28, 2020, 6:32pm

Hi @ptrblck

Indeed there seems to be zombie processes. Btw, previously called scripts run fine however, latter ones yield an error due to GPU running OOM.