Cuda out-of-memory run-time error handling --fall back to cpu possibility?

We usually put a simple cuda detect method before using the gpu like below:

dev = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

This works fine for detecting if a hw cuda device is available or not, however, I will still meet out-of-memory errors occasionally–in particularly if I have a few tasks run at the same time.
I wonder if some advanced solution would exist, that may help to prevent such error, for exp, have an early check and estimate the hw resource needed, and then compare to actually availability–so to prevent such run-time crash?

A few warmup iterations might help but it would also depend on your actual use case.
E.g. if you are working with a language model each input batch could have a different size depending on the longest sentence in the current batch. These things can be runtime dependent, so even a few warmup iterations wouldn’t help.


no utility could predict if the user will be running other applications using GPU memory. To limit the available device memory for the current process you could use torch.cuda.set_per_process_memory_fraction(fraction, device=None).

@ptrblck great thanks. That will be helpful.
Just further explore: what I mean are mostly like below:

  1. before each task runs, do a check of how many resources would be needed for the upcoming task,
  2. then have a check about the currently available hw resource, etc…
  3. then it should be able to judge if it is safe to go forward, if it is not, it still has a choice to fall back to cpu, rather than just move forward and have the program crash…

of course, this can not deal with all situations in particularly the multiple parallelly run-time tasks (which might need locker like in C/C++) but I think that already can cure the problem somewhat and make the program not that easily crash.