[RuntimeError: CUDA out of memory] I have larger gpu memory than it needs

I have made a similar post, but I couldn’t get the answer I wanted so I’m re-posting.
I’m using DataParallel using 4 gpus, and I get these out-of-memory error:

[case 1]
RuntimeError: CUDA out of memory. Tried to allocate 3.62 GiB (GPU 0; 47.99 GiB total capacity; 24.39 GiB already allocated; 20.41 GiB free; 24.71 GiB reserved in total by PyTorch)

[case 2]
RuntimeError: CUDA out of memory. Tried to allocate 3.62 GiB (GPU 2; 47.99 GiB total capacity; 13.14 GiB already allocated; 31.59 GiB free; 13.53 GiB reserved in total by PyTorch)

Both cases, my gpu has enough memory but it cannot allocate much smaller memory.
In some other post, I saw how “available” free memory is calculated :
free memory = reserved memory - allocated memory
This makes me suspecting that maybe it’s because Pytorch didn’t reserve enough memory although there are more available. But then I don’t know how to make pytorch reserve enough (more) memory since it automatically assigns.

Any advice or suggestions will be very appreciated.
Thanks for reading!

Could you check the output of nvidia-smi in these cases? Would it be possible that another process or instance of PyTorch is also using memory?

I am monitoring the gpu memory using nvidia-smi, and it never reaches even close to the maximum.
up to 33GiB / 48GiB. All of gpus.

Are there any other processes contending for GPU memory or just the single PyTorch instance here?

I was only running a pytorch.