CUDA allocator not able to use cached memory [solution]

Hi, this is very similar to this post here: Unable to allocate cuda memory, when there is enough of cached memory, but I just wanted to check if my proposed solution should work as a fix.

My error is:

RuntimeError: CUDA out of memory. Tried to allocate 1.53 GiB (GPU 3; 15.78 GiB total capacity; 6.74 GiB already allocated; 792.19 MiB free; 13.82 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
CUDA out of memory. Tried to allocate 1.53 GiB (GPU 3; 15.78 GiB total capacity; 6.74 GiB already allocated; 792.19 MiB free; 13.82 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

As you can see, I’m trying to allocate 1.53GiB with 7GiB allocated. Ideally the extra 6GiB reserved by PyTorch could be used for the 1.53GiB, but it doesn’t seem to be able to handle this. Likely this is because my application does a lot of GPU-CPU swaps as it’s using a memory offload system I’ve developed (similar to ZeRO).

I know that DeepSpeed handles memory management themselves to avoid this issue, but I’m just looking for a quick fix. Would setting this variable:

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128

avoid fragmentation and thus resolve my issue? What are the implications of setting this variable?

I would appreciate any help on this subject.

Tuning the caching allocator split size is kind of in the real of black magic, so it’s not exactly easy to predict what would happen other than just running your code/model with a few settings to see what happens.

Can confirm it worked. Hopefully this post helps anyone else with the same issue.

1 Like