I have made a similar post, but I couldn’t get the answer I wanted so I’m re-posting.
I’m using DataParallel using 4 gpus, and I get these out-of-memory error:
RuntimeError: CUDA out of memory. Tried to allocate 3.62 GiB (GPU 0; 47.99 GiB total capacity; 24.39 GiB already allocated; 20.41 GiB free; 24.71 GiB reserved in total by PyTorch)
RuntimeError: CUDA out of memory. Tried to allocate 3.62 GiB (GPU 2; 47.99 GiB total capacity; 13.14 GiB already allocated; 31.59 GiB free; 13.53 GiB reserved in total by PyTorch)
Both cases, my gpu has enough memory but it cannot allocate much smaller memory.
In some other post, I saw how “available” free memory is calculated :
free memory = reserved memory - allocated memory
This makes me suspecting that maybe it’s because Pytorch didn’t reserve enough memory although there are more available. But then I don’t know how to make pytorch reserve enough (more) memory since it automatically assigns.
Any advice or suggestions will be very appreciated.
Thanks for reading!