Out of Memory by Torch (CUDA)

Hi,

I make a preprocessing toolkit for images, and try to make a “batch” inference for a panopic segementation (using DETR model@huggingface). Then torch tried to allocate large memory space (see text below). I set max_split_size_mb=512, and this running takes 10 files and took 13MB in total.

I want to understand what is the allocation (5.25 GiB in this case), for what (purpose and who use the space, not 13MB).

best,
somedays


OutOfMemoryError: CUDA out of memory. Tried to allocate 5.25 GiB (GPU 0; 15.69 GiB total capacity; 13.66 GiB already allocated; 1.36 GiB free; 13.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

1 Like

Is your question what is consuming the memory?

GPU memory is allocated to store the datasets and the model;

If you cant reduce the model total size, you can reduce reduce the batch size for example.

Can you print a model summary to see the total number of GB allocated by the model ?

My question is about the memory allocation, not about how much trained model consumes memory.

best,
somedays

there is no difference, the allocated memory is the space that the model and dataset take in memory

My question is reason why torch allocate large space(GBytes) for input although input total size is dozen MBytes, and torch reserves more than 10GB (see the message above), I think that the trained model is allocated to the reserved space, and torch or torch.cuda also allocate GBytes for caching (temporally space as a cache memory function).

So, I want to understand the memory allocation subsystem, how it works by torch.

I want to understand;

  1. reason why torch tries to allocate GBytes space for MBytes input (on my guess).
  2. then how the memory allocation subsystem works for it.

best,
somedays

1 Like

I see, fair point. I guess it tries to fit all the images into memory to train faster.

I think they discuss here some details:

@somedays

PyTorch won’t load the entire dataset behind your back and onto the GPU, as it wouldn’t make sense.

@somedays are you forgetting the forward activations? The parameters and inputs could be tiny in comparison to the activations depending on the model architecture. This post describes it in more detail for a resnet.

1 Like

I understand reason why torch tried to allocate GBytes space to input data. I use streamlit for my development, streamlit.file_uploader() supports max 200MBytes per file. The loader allocates full of 200MByes for one file, object returned from the loader includes such the allocation, my guess. 20 files, around 1MByes image file cam be 4GBytes or more. This is same size to torch tried to allocate.

Through this experience, I have read read torch/code directory (memory.py for especially), and learned how torch collect stat info for the memory.

best,
somedays