Memory allocation

Hello, I am totally new to pytorch, so forgive my French … I have a stack trace showing this:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 446.00 MiB (GPU 0; 14.54 GiB total capacity; 753.56 MiB already allocated; 246.56 MiB free; 930.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

My GPU has 14.54GB, so where does this go wrong? I must say I am using the app in a docker container, are there any known limitation there maybe?

Any help/info highly appreciated! Thanks!


Check if any other application is using memory and could cause the OOM issue.

Hi Ptrblck,
Yes there was, killed it solved it. But suppose you have 2 containers trying to share the GPU capabilities, how do you manage the congestion? Thanks!

Tx and yes, there was, but assume we have two containers rtying to share the CPU, how do you cope with congestion in respect to the available GPU resources?


You could limit the available device memory for each process via torch.cuda.set_per_process_memory_fraction(fraction, device=None).

Hmmz, tx, but would that mean that you need to kinda statically divide the memory over the processes and do the assignment per container/process? Looks like the only alternative. That renders the containerization less flexible…

Yes, you would define the limit per process. How else would you handle it? You could just use the default and allow all processes to allocate the entire memory but would then run into the original issue.

Got it, they are unaware of each other. Tx man!

last question, when the python is finished and waiting for new work, can I in the meantime ‘release’ the GPU? I am now in a waiting loop waiting for user input, so in this period I 'd rathther free up the gpu. Is there some statement for that? Thxnks!

Yes, you could del all objects which are not needed anymore and clear the cache via torch.cuda.empty_cache(). This will return the cached memory to the OS and will allow other processes to allocate it. Only the CUDA Context and referenced tensors/objects will still allocate and use memory.

Hi, and still tx for your help, highly appreciated, I m really new to this. Suppose I put the empty_cache() in the finaly clause of my python, should that be sufficient? All objects are del/disposed/destroyed by design at the end of the finaly block, right? Or do I need to be really explicit about which objects to del? Does python not have a garbage collection orso like in .net?

Python uses function scoping and will delete objects once you return from the function. In case you’ve created tensors in such a function they will be deleted and the CUDA memory will be returned to the cache. You can then free this cache via torch.cuda.empty_cache().
In case you are creating objects in the global scope you would explicitly need to delete them before calling empty_cache(). The same applies for returned objects from any function which are still referenced.

Yes, Python uses a garbage collector.