[RuntimeError: CUDA out of memory] I have larger gpu memory than it needs

C_J · April 21, 2022, 8:45pm

I have made a similar post, but I couldn’t get the answer I wanted so I’m re-posting.
I’m using DataParallel using 4 gpus, and I get these out-of-memory error:

[case 1]
RuntimeError: CUDA out of memory. Tried to allocate 3.62 GiB (GPU 0; 47.99 GiB total capacity; 24.39 GiB already allocated; 20.41 GiB free; 24.71 GiB reserved in total by PyTorch)

[case 2]
RuntimeError: CUDA out of memory. Tried to allocate 3.62 GiB (GPU 2; 47.99 GiB total capacity; 13.14 GiB already allocated; 31.59 GiB free; 13.53 GiB reserved in total by PyTorch)

Both cases, my gpu has enough memory but it cannot allocate much smaller memory.
In some other post, I saw how “available” free memory is calculated :
free memory = reserved memory - allocated memory
This makes me suspecting that maybe it’s because Pytorch didn’t reserve enough memory although there are more available. But then I don’t know how to make pytorch reserve enough (more) memory since it automatically assigns.

Any advice or suggestions will be very appreciated.
Thanks for reading!

eqy · April 21, 2022, 9:34pm

Could you check the output of nvidia-smi in these cases? Would it be possible that another process or instance of PyTorch is also using memory?

C_J · April 22, 2022, 5:35am

I am monitoring the gpu memory using nvidia-smi, and it never reaches even close to the maximum.
up to 33GiB / 48GiB. All of gpus.

eqy · April 22, 2022, 6:01am

Are there any other processes contending for GPU memory or just the single PyTorch instance here?

C_J · April 22, 2022, 6:08am

I was only running a pytorch.