I get CUDA out of memory error when I have enough memory than it needs.
FYI, I’m using Dataparallel. (not Distributed)
RuntimeError: CUDA out of memory. Tried to allocate 3.62 GiB (GPU 3; 47.99 GiB total capacity; 13.14 GiB already allocated; 31.59 GiB free; 13.53 GiB reserved in total by PyTorch)
I’ve checked hundred times to monitor the GPU memory using nvidia-smi and task manager, and the memory never goes over 33GiB/48GiB in each GPU. (I’m using x4). It tries to allocate 3.62GiB, but there are 13.53GiB free memory.
Now I’m suspecting that it’s because pytorch didn’t reserve enough memory when there are more available, and hoping if I could force pytorch to reserve more memory manually.
This problem is really painful and I saw there were some similar cases, but found no working solution.
Any advice/suggestions would be appreciated.
Thanks for reading!