Hi, this is very similar to this post here: Unable to allocate cuda memory, when there is enough of cached memory, but I just wanted to check if my proposed solution should work as a fix.
My error is:
RuntimeError: CUDA out of memory. Tried to allocate 1.53 GiB (GPU 3; 15.78 GiB total capacity; 6.74 GiB already allocated; 792.19 MiB free; 13.82 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
CUDA out of memory. Tried to allocate 1.53 GiB (GPU 3; 15.78 GiB total capacity; 6.74 GiB already allocated; 792.19 MiB free; 13.82 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
As you can see, I’m trying to allocate 1.53GiB with 7GiB allocated. Ideally the extra 6GiB reserved by PyTorch could be used for the 1.53GiB, but it doesn’t seem to be able to handle this. Likely this is because my application does a lot of GPU-CPU swaps as it’s using a memory offload system I’ve developed (similar to ZeRO).
I know that DeepSpeed handles memory management themselves to avoid this issue, but I’m just looking for a quick fix. Would setting this variable:
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
avoid fragmentation and thus resolve my issue? What are the implications of setting this variable?
I would appreciate any help on this subject.