CUDA Out of Memory; but enough is free

Hello, I’m running a transformer model from the huggingface library and I am getting an out of memory issue for CUDA as follows:

RuntimeError: CUDA out of memory. Tried to allocate 48.00 MiB (GPU 0; 3.95 GiB total capacity; 2.58 GiB already allocated; 80.56 MiB free; 2.71 GiB reserved in total by PyTorch)

How does this error make any sense if it need to allocate 48 MiB but there are still 80.56MiB free?

Here is my setup:

Ubuntu 20.04

its because of fragmentation, if you’re using like 90% device memory, it will fail to find big contiguous free blocks.

you can try to explicitly do python’s garbage collection and torch.cuda.empty_cache(), but this only helps in some cases. another thing is to try to avoid allocating tensors of varying sizes (e.g. varying batch sizes).