How to avoid defragmentation?

mahesh_bhosale · March 14, 2023, 9:37pm

CUDA out of memory. Tried to allocate 6.85 GiB (GPU 0; 23.69 GiB total capacity; 9.79 GiB already allocated; 2.73 GiB free; 16.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The message says that the reserved memory is significantly large than the allocated memory, we should handle the fragmentation. What needs to be done to handle the fragmentation?

mahesh_bhosale · March 19, 2023, 9:59pm

Can someone please suggest how to avoid this issue? I have already tried freeing the cache and I have blocked the splitting of the blocks by export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128. But it doesn’t help.

irisz · March 31, 2023, 8:49pm

You may wanna get info from how GPU memory is allocated first by using

torch.cuda.memory_summary(device=None, abbreviated=False)

You can also try reduce your model size, batch size…