"RuntimeError: CUDA error: out of memory" for no reason

1309123499 · September 12, 2021, 7:36am

even with the simplest calculation i got Error OOM and i have already reinstall conda and cuda.
as to the env info
PyTorch version: 1.4.0+cu100
Is debug build: No
CUDA used to build PyTorch: 10.0

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: Could not collect

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: GeForce RTX 2080 Ti
Nvidia driver version: 460.91.03
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.4.0+cu100
[pip3] torchvision==0.5.0+cu100
[conda] torch 1.4.0+cu100 pypi_0 pypi
[conda] torchvision 0.5.0+cu100 pypi_0 pypi

thanks for your attention!

Rahul_Chand · September 12, 2021, 2:44pm

@1309123499
You can check the GPU usage via nvidia-smi command. There you can see if there is an already existing process that is taking up the GPU memory & kill it if needed.

1309123499 · September 13, 2021, 2:40am

That is the problem, about 5-20m/11G. I just cannot use the gpu.

ptrblck · September 13, 2021, 2:56am

Make sure to restart the machine after installing CUDA or the driver, as this is often the root cause of these issues.

1309123499 · September 13, 2021, 3:07am

I connect the remote server to train my model. After encountering this problem, I have reinstalled the cuda and the rebuilt the conda environment, as well as reconnected to the server. and this problem happened between two debugings, which means i did nothing about the machine or the environment. It really makes me confused.