How can I run out of memory when most memory is available?

Im doing some computations (basically using torch as numpy with gpu) and I ran out of memory when there is plenty available. How is that possible?

I get the following error:
RuntimeError: CUDA out of memory. Tried to allocate 144.00 MiB (GPU 0; 11.00 GiB total capacity; 972.39 MiB already allocated; 8.57 GiB free; 986.00 MiB reserved in total by PyTorch)

So basically I have 8.57GB free but run out of memory trying to allocate 144MB. Is it that the GPU memory can suffer from very severe heap fragmentation or something in that regard?

Could you post a code snippet to reproduce this issue?
I doubt it’s memory fragmentation, so would like to debug it.

I found the reason for the low memory. I was simply using much more memory than I expected. And the problem seem to be that the error message was missleading. The reason for that I don’t know.