How to interpret the memory information

I got the following OOM error, but I cannot understand the message precisely.

OOM: Ran out of memory with exception: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 2; 39.59 GiB total capacity; 31.01 GiB already allocated; 21.19 MiB free; 31.83 GiB reserved in total by PyTorch)

From the message, seems PT reserved only 31.83GB, given the GPU capacity is 39.59GB, it should have ~8 GB free, but it only has 21.19MB free. Does it mean there are 8GB memory not managed by PT allocator? Could you help me understand where the 8GB memory goes?

Is the reserved memory the total memory that is allocated and cached by the memory allocator?

Also, I got the following memory stat print out, seems align with the error message.

Other processes might use it, which should be visible via nvidia-smi. Memory fragmentation could also play a role, but I doubt you are losing so much memory due to this.

1 Like

I checked all memory is free via nvidia-smi when the system is idle before + after my test, so I dont think other processes are using it.

is it possible that some memory is allocated in the process, but not managed by the PT allocator (e.g., directly goes into the cuda allocator?)

If it is fragmentation, wouldn’t that be counted in the “reserved memory” ? Only 31.83 GBs are reserved…

Yes, this would be possible if you are using a custom CUDA extension and are allocating memory directly there.

Could you try to post a minimal, executable code snippet which would reproduce this behavior on a 40GB GPU?