Help CUDA error: out of memory

RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I’m getting this error message when try to load a pytorch model in flask application

The error is raised if you are running our of memory on your device, so you could try to reduce the memory requirement e.g. by lowering the batch size (if possible).

Hi, I got the same error when calling “x.to(“cuda:3”)”, where x = torch.randn(1,1).

= = = = = = = = = = = = = = = = = =
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
= = = = = = = = = = = = = = = = = =

However, I was able to call “x.to(“cuda:2”)”. Running nvidia-smi gives

±------------------------------±---------------------±---------------------+
| 2 GeForce RTX 208… Off | 00000000:67:00.0 Off | N/A |
| 27% 32C P8 1W / 250W | 4MiB / 11019MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 3 GeForce RTX 208… Off | 00000000:68:00.0 Off | N/A |
| 28% 38C P8 28W / 250W | 21MiB / 11016MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

I suspected the problem is related to setting non_blocking=True when using “some_tensor.to(device, non_blocking=True)”? I’ve been successfully running my actual code all day, which uses 4 gpus (cuda:0-cuda:3), but at some point I started getting the above error. If I recall correctly, it might happen after I killed one of two “python main.py” processes.

This sounds rather like a setup issue and unrelated to the usage of non_blocking=True.
Check, if some dead processes are still using the device. In case the CUDA context is corrupt in the current session, start a new Python process. However, since the error seems to have popped up suddenly, you might also check if a restart of your workstation could help.

Hey,

Try to kill the process. For that do the following:

  1. nvidia-smi
  2. In the lower board you will see the processes that are running in your gpu’s
  3. Check their PID
  4. Delete those processes making kill PID_Number

Hope it helps