Cuda out-of-memory run-time error handling --fall back to cpu possibility?

Leonmac · June 9, 2022, 10:27am

We usually put a simple cuda detect method before using the gpu like below:

dev = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

This works fine for detecting if a hw cuda device is available or not, however, I will still meet out-of-memory errors occasionally–in particularly if I have a few tasks run at the same time.
I wonder if some advanced solution would exist, that may help to prevent such error, for exp, have an early check and estimate the hw resource needed, and then compare to actually availability–so to prevent such run-time crash?

ptrblck · June 9, 2022, 6:35pm

A few warmup iterations might help but it would also depend on your actual use case.
E.g. if you are working with a language model each input batch could have a different size depending on the longest sentence in the current batch. These things can be runtime dependent, so even a few warmup iterations wouldn’t help.

Additionally,

no utility could predict if the user will be running other applications using GPU memory. To limit the available device memory for the current process you could use torch.cuda.set_per_process_memory_fraction(fraction, device=None).

Leonmac · June 10, 2022, 3:23am

@ptrblck great thanks. That will be helpful.
Just further explore: what I mean are mostly like below:

before each task runs, do a check of how many resources would be needed for the upcoming task,
then have a check about the currently available hw resource, etc…
then it should be able to judge if it is safe to go forward, if it is not, it still has a choice to fall back to cpu, rather than just move forward and have the program crash…

of course, this can not deal with all situations in particularly the multiple parallelly run-time tasks (which might need locker like in C/C++) but I think that already can cure the problem somewhat and make the program not that easily crash.

Leonmac · September 8, 2022, 2:25am

@ptrblck Back to this topic. I am still occasionally stuck on the CUDA OOM issue.
I am not sure if the below exception handlers strategy for CUDA memory are workable, rather than running into run-time error and exit.

for exp:

try: 
 # 1.check if GPU is available:
dev = torch.device("cuda:0" if torch.cuda.is_available() else raise exception)
# 2.allocate the memory limitation for the process:
# however this will raise an error if fail to allocate memory?
 torch.cuda.set_per_process_memory_fraction(0.9, device=dev)
# 3.do few "warmup iterations" -- but still, if OOM happen, an error will be raised.
except:
    dev = torch.device("cpu")
else:
   # run a normal  business here on dev

ptrblck · September 8, 2022, 4:41am

set_per_process_memory_fraction will not avoid the OOM, but will reduce the available memory for PyTorch so that other applications can use the remaining GPU memory.
The suggestion was targeting your use case of running different applications at once.

Leonmac · September 13, 2022, 3:11am

@ptrblck for the below one, is there any suggestion? if something is not correctly configured that OOM happens, the program will be crashed rather then raise an exception…
# 3.do few "warmup iterations" -- but still, if OOM happens, an error will be raised.

rmast · May 18, 2023, 6:49pm

@Leonmac I had the same idea. Have you reached any progress on this otherwise?

As on Windows there appears to be enough ‘shared memory’ still available outside my real GPU-memory, probably this UVM-initiative could be helpful?
Support for NVIDIA UVM technology · Issue #44380 · pytorch/pytorch (github.com)