Apologies for resurrecting this - I am having the same issue regularly. I get the RuntimeError, as in the first message of this thread, the first time I send any data to the GPU.
I have exclusive access to the GPU, so I could solve my issue if I could force the GPU memory to be cleared or freed. Is there a function in torch which I can use to do this? I’ve reviewed the information about memory management on the docs here and I’m not entirely sure that torch.cuda.empty_cache() will resolve this.
An ideal solution for me would look something like:
...
torch.cuda.clear_memory_allocated() # entirely clear all allocated memory
model = model.to(device)
...
My feeling is that your issue is different from the one discussed here, @JamesOwers. You, obviously, need to free the variables that hold the GPU RAM (or switch them to cpu), you can’t tell pytorch to release them all for you since it’d lead to an inconsistent state of your interpreter.
Go over your code and free any variables you no longer need as soon as they aren’t not used anymore.
If you’re using a jupyter nb you could create a “virtual” scope using ipyexperiments, which can then automate the release.
If outside jupyter, wrap your code in a function and unless you create circular references once the function returns it’ll release the local variables and free up the memory for you.
p.s. perhaps one could write something to automatically switch all cuda variables to cpu, diverting the “leak” to general RAM, which may help in a short term, but it’s not really solving the actual issue with your code, just delaying the inevitable.
Thanks for your reply. To be clear, I get this error the first time I send any data to the GPU.
That is, when I call model.to(device), this is the first variable to be sent to the GPU - unless I’m misunderstanding, at this point I don’t have any variables to clear. Despite this, I get the error. I am therefore presuming there is uncleared memory from a previous process.
To address the others: I’m not in a notebook, and this is within a function. Additionally, I do not get any error about 95 times out of 100 when running this code.
Well, what’s your GPU memory consumption is reported before you run this function? (nvidia-smi, or whatever other reporting tool do you use)
If it’s the first call, then you should have 100% GPU available before you do that call. I assume you’re with your own GPU card.
If you use some kind of online service, then it’s a different story.
If you start with GPU RAM already used up you should kill the previous processes if they didn’t quit.
Alternatively, it’s possible that you have 100% GPU RAM available but your very first variable is already bigger than the available GPU RAM.
It’s just very hard to diagnose your issue w/o you telling the full story - setup, size of GPU, local/online, etc.
In any case add some code to measure available RAM at the beginning of your code and an assert for it to bail if it can’t detect a sufficient amount of GPU RAM available, telling you to clean up any run-away processes if any.
@stas - again, much appreciate your input here. Appreciate your time helping me diagnose this.
I’ll describe the setup:
GPU cluster with a broad mix of different gpu types (Tesla K40m, GeForce Titan X, GeForce GTX Titan X, GeForce Titan X (Pascal))
Slurm job scheduler to coordinate job submission:
There are many users and my job will begin after another job has just finished
When my job begins, I have exclusive access to that GPU - the GPUs are only ever used by one user’s job at a time
It’s a service locally hosted by my university, so I can submit support tickets etc. I have reported the issue and we are struggling to fix. I’m here because I’m trying to find a simple workaround!
At the beginning of the job I report the usage with the tool GPUtil - but this uses nvidia-smi under the hood. The usage reported is always 0 - as expected, e.g.:
| ID | GPU | MEM |
------------------
| 0 | 0% | 0% |
I know that my variable is smaller than the available RAM because I’ve measured the size of my model (it’s a few megabytes), and because the error message is slightly different from yours; mine follows the format - tried to allocate {small_number} ... {much_larger_number} free; ...). For example:
RuntimeError: CUDA out of memory.
Tried to allocate 4.50 MiB (GPU 0; 11.91 GiB total capacity;
213.75 MiB already allocated; 11.18 GiB free; 509.50 KiB cached)
This is what has led me to the conclusion that the GPU has not been properly cleared after a previously running job has finished.
Your proposed solution to bail if there isn’t enough RAM at the start will not work - there is enough RAM according to nvidia-smi and indeed the error message. I imagine there is not enough contiguous memory!
Regardless, to fix, I think all I need to do is to clear the GPU’s memory at the beginning of my job (or simply wait until this is done). Is there a way to force this?
Alternatively, it could be that the GPU is clear, but the first variable is sent to the GPU memory in an extremely fragmented way. Is there any reason why this would happen?
Thank you for the additional information, @JamesOwers.
So your error message is very telling:
It says that you have 11GB (!) free and it can’t allocate 5MB - that makes no sense.
See this discussion where I tried to diagnose the non-contiguous memory just to discover that nvidia will re-allocate fragmented pages of at least 2MB to make contiguous memory. So unless your code somehow allocates memory that it only consumes a tiny fraction of each 2MB page, fragmenting 12GB of RAM this shouldn’t really happen.
So a few things I’d like to suggest in no particular order:
catch that failure and add sleep so that the program doesn’t exit at that point of failure and check what nvidia-smi says about that card’s RAM status - what is the reported used/free memory there. This is to double check that perhaps there is something wrong with the card and that it reports wrong numbers.
Since you said it happens 5% of the time, did you observe that it perhaps happens with the same specific card? i.e. again a faulty card?
can you reliably reproduce when you hit that 5% situation?
reduce your variable size by say half - does it fit into the memory? if not half again and so on - see what fits
when that error happens, can you catch it and then try to allocate a simple large tensor say torch.zeros() of a few GBs? torch.ones((n*2**18)).cuda().contiguous() where n is the number of desired MBs - and adjust cuda() to match your setup if needed to(...)
My feeling is that your array of cards has a faulty card. That last suggestion could be the key - allocate 10GB of RAM (say 80% of the card’s capacity) and free it right away at the beginning of your program - if it fails, you don’t want to use that card.
@stas - many thanks for this. I’m going to implement your suggestion of attempting to allocate some known large tensor right at the start of the job, and report & rerun upon failure.
As the error message states your GPU is running out of memory, so you would need to either reduce the batch size, the model itself, or could potentially trade compute for memory using torch.utils.checkpoint.
Well, you may want to read this thread from the top - as it discusses this problem - and then it’d make sense, thanks to the helpful replies of others.
Tried to allocate 2.00 MiB (GPU 0; 11.00 GiB total capacity; 9.44 GiB already allocated; 997.01 MiB free; 10.01 GiB reserved in total by PyTorch)
I don’t think I have the fragmentation issue discussed above, but 2 MB shouldn’t be a problem (I’m using a really small batch size).
I’ve also tried running on 2 GPUs that are bridged with an SLI bridge. This gives me a total of 22 GB, but I’m getting the same error message with 11.00 GiB. Does Pytorch support GPUs that are bridged?
You are running out of memory, so you would need to reduce the batch size of the overall model architecture. Note that your GPU has 2GB, which would limit the executable workloads on this device.
You could also try to use torch.utils.checkpoints to trade compute for memory.
reducing to smallest batch_size =2 still didnt worked. Giving error, RuntimeError: CUDA out of memory. Tried to allocate 144.00 MiB (GPU 0; 2.00 GiB total capacity; 1.01 GiB already allocated; 105.76 MiB free; 1.05 GiB reserved in total by PyTorch)
I tried to do restart and things, but it dont worked.
when using without cuda, notebook freezes on running both locally and in colab.
Oh it might be problem in my implementation, pretrained network using cuda working.
It could be that your GPU is just too small for the job you’re trying to do. Perhaps use Colab to train (free) and then your GPU for finetune/inference?
I think i have a similar issue. Model is a BiLSTM+CRF. Random spiking of GPU memory usage and then RuntimeError: CUDA out of memory. Larger batch size worked fine. Smaller batch size worked fine once and couple of other times it ended in runtime error.
All experiments have same parameters except the following:
Light blue - batch size 128
All others - batch size 32
Have a look at this memory profiler/monitor if you’re running in a jupyter notebook - https://github.com/stas00/ipyexperiments - it might help you to identify where you lose that memory.