How does "reserved in total by PyTorch" work?

I got

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.76 GiB total capacity; 9.76 GiB already allocated; 21.12 MiB free; 9.88 GiB reserved in total by PyTorch)

I know that my GPU has a total memory of at least 10.76 GB, yet PyTorch is only reserving 9.88 GB.
In another instance, I got a CUDA out of memory error in middle of an epoch and it said “5.50 GiB reserved in total by PyTorch
Why is it not making use of the available memory? How do I change this amount of memory reserved?

Using PyTorch 1.4.0 and cuda 10.1


You should look for the following

  • torch.cuda.empty_cache() This should free up the memory

  • If the memory still does not get freed up, there is a active variable in your session that is locking up memory. Restart your session and run your code again

I could manage to free up enough memory to get my network running in this instance. However, the point of this post is to try to understand this line, “reserved in total by PyTorch” and look for a way to control that. As you may have noted, I have also quoted another instance where it only reserved half of the available memory

1 Like

I have the exact same question. I can not seem to locate any documentation on how pytorch reserves memory and the general information regarding memory allocation seems pretty scant.

I’m also experiencing Cuda Out of Memory errors when only half my GPU memory is being utilised (“reserved”).


I also got this problem. My GPU has 8GB in total yet CUDA reserves only 5G while running the program.


I am also facing this issue, 23.65 GB on the card but only 12.53GB reserved by PyTorch. The card is set to exclusive mode.

RuntimeError: CUDA out of memory. Tried to allocate 10.84 GiB (GPU 0; 23.65 GiB total capacity; 1.03 GiB already allocated; 10.26 GiB free; 12.53 GiB reserved in total by PyTorch)

It would be really helpful to understand why this memory is not being utilised

I am using Google Colab and using Hugging Face library. The code ran once but I tried to restart the session and rerun it and resulted in the following error.

RuntimeError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 7.43 GiB
total capacity; 6.58 GiB already allocated; 30.94 MiB free; 6.79 GiB reserved in 
total by PyTorch)

It would be really great to understand what is causing this issue.


Based on the error message it seems the session restart might not have cleared the GPU memory.
Did you change anything else or did you just restart the notebook?
Could you restart the environment and check the used GPU memory?

@LuckyIceland @Umais @abhinavdhere see this issue which gives more details

It seems that “reserved in total” is memory “already allocated” to tensors + memory cached by PyTorch. When a new block of memory is requested by PyTorch, it will check if there is sufficient memory left in the pool of memory which is not currently utilized by PyTorch (i.e. total gpu memory - “reserved in total”). If it doesn’t have enough memory the allocator will try to clear the cache and return it to the GPU which will lead to a reduction in “reserved in total”, however it will only be able to clear blocks on memory in the cache of which no part is currently allocated. If any of the block is allocated to a tensor it won’t be able to return it to GPU. Thus you can have scenarios where the tensor allocated memory + the gpu free memory is much less than the gpu total memory, because the cache is holding some which it cannot release


Are you sure about that? It seems like the amount I have “free”, as well as the amount reserved, but not allocated by PyTorch, should be enough. Or am I misunderstanding something?

0.633 + (10.72-8.86) = 2.493 > 1.93

This looks like the exact scenario I outlined above. The (10.72-8.86)=1.86 can on be released back to CUDA (and potentially joined with the 0.633 in a new allocation to meet your requirement) if the blocks contained within that are not already partially allocated, i.e. they have not been split and some part is currently allocated to a tensor. If that has happened (which it looks like it has otherwise you wouldn’t see an error) then the only way to recover the 1.86 is to delete the tensor(s) which are allocated to the now fragmented block. Then when you run torch.cuda.empty_cache() the block will be released and rejoin the main CUDA pool, showing up as “free” and you should be able to allocate a new block to hold your new 1.93 tensor.

Hmm, thanks! It’s especially interesting trying to deal with CUDA/PyTorch/deep learning memory issues, since there is both cpu ( can be fixed via gc.collect() and del statements) and gpu (as you mentioned, torch.cuda.empty_cache())

1 Like

I’m running into a similar issue. Here is the error.

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.17 GiB total capacity; 199.97 MiB already allocated; 8.69 MiB free; 210.00 MiB reserved in total by PyTorch)

Clearly my GPU has enough memory to allocate 20.00 MiB. The limit seems to be set by the amount “reserved in total by PyTorch”. And this amount seems absurdly low to me.

@jon @david-macleod I’d be curious to hear your thoughts on whether this scenario is consistent with your interpretation of these memory specs. If so what would you recommend to fix it. Or if anyone else has an explanation or solution.

1 Like

Could you check, if other processes are using the GPU and are allocating memory, since PyTorch reports only 8.69 MiB are free.

Thanks for responding @ptrblck.

There are other processes using the GPU. I’m seeing this error on a predictor service that has a 6 workers that receive prediction requests, preprocess the data, and then call the pytorch model. The workers are spun up via multiprocessing so I assume that that means each one has its own copy of the model.

Then I’m guessing that after making 6 copies of the model, most of the GPU memory is consumed. Then I see this error when a particularly large example is received by one of the workers.

Does that make sense? If so I will try reducing the number of workers to free up some of the GPU memory.

Yes, this makes sense and indeed an unexpectedly large batch in one process might create this OOM issue. You could try to reduce the number of workers or somehow guard against these large batches and maybe process them sequentially?

Is not working for me:

OOM: Ran out of memory with exception: CUDA out of memory.
Tried to allocate 5.59 GiB (GPU 0; 10.92 GiB total capacity; 4.28 GiB
already allocated; 5.14 GiB free; 5.20 GiB reserved in total by PyTorch)

Same here…

RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 10.76 GiB total capacity; 8.84 GiB already allocated; 117.62 MiB free; 8.90 GiB reserved in total by PyTorch)

What a PyTorch … Extremely sad…

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 2.00 GiB total capacity; 1.15 GiB already allocated; 13.09 MiB free; 1.16 GiB reserved in total by PyTorch)

Is this issue still not resolved! Sad.
I too am facing same problem.

RuntimeError: CUDA out of memory. Tried to allocate 540.00 MiB (GPU 0; 4.00 GiB total capacity; 1.94 GiB already allocated; 267.70 MiB free; 2.10 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Only half of memory(1.94GB) is being used, rest half (2.10GB) is not being used, Or I’d say is being wasted!!