Cuda runtime error: out of memory

11177 · May 9, 2018, 2:15am

I came across a error after updating from pytorch 0.3.1 to 0.4.

a = torch.randn(3, 5)
a.cuda()
Traceback (most recent call last):
File “”, line 1, in
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/THCGeneral.cpp:844

torch.cuda.get_device_name(1)
Traceback (most recent call last):
File “”, line 1, in
File “/home1/nh/anaconda3/lib/python3.6/site-packages/torch/cuda/init.py”, line 272, in get_device_name
return get_device_properties(device).name
File “/home1/nh/anaconda3/lib/python3.6/site-packages/torch/cuda/init.py”, line 290, in get_device_properties
init() # will define _get_device_properties and _CudaDeviceProperties
File “/home1/nh/anaconda3/lib/python3.6/site-packages/torch/cuda/init.py”, line 143, in init
_lazy_init()
File “/home1/nh/anaconda3/lib/python3.6/site-packages/torch/cuda/init.py”, line 161, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/THCTensorRandom.cu:25

My system information are as follows:

PyTorch version: 0.4.0
Is debug build: No
CUDA used to build PyTorch: 8.0.61

OS: Ubuntu 14.04.5 LTS
GCC version: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
CMake version: version 3.5.0

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 8.0.61
GPU models and configuration:
GPU 0: TITAN X (Pascal)
GPU 1: TITAN X (Pascal)
GPU 2: TITAN X (Pascal)
GPU 3: TITAN X (Pascal)
GPU 4: TITAN X (Pascal)
GPU 5: TITAN X (Pascal)
GPU 6: TITAN X (Pascal)
GPU 7: TITAN X (Pascal)

Nvidia driver version: 375.26
cuDNN version: Probably one of the following:
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcnn_bak/libcudnn.so
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcnn_bak/libcudnn.so.5
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcnn_bak/libcudnn.so.5.0.5
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcnn_bak/libcudnn.so.5.1.10
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcnn_bak/libcudnn_static.a
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5.0.5
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5.1.10
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.6
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn_static.a
/usr/local/lib/python2.7/dist-packages/torch/lib/libcudnn.so.6

Versions of relevant libraries:
[pip3] msgpack-numpy (0.4.1)
[pip3] numpy (1.14.2)
[pip3] numpydoc (0.7.0)
[pip3] pytorch-nlp (0.2.0)
[pip3] torch (0.4.0)
[pip3] torchfile (0.1.0)
[pip3] torchsummary (1.1)
[pip3] torchtext (0.3.0)
[pip3] torchvision (0.2.1)
[conda] cuda80 1.0 0 soumith
[conda] magma-cuda80 2.3.0 1 pytorch
[conda] pytorch-nlp 0.2.0
[conda] torch 0.4.0
[conda] torchfile 0.1.0
[conda] torchtext 0.3.0
[conda] torchvision 0.2.1

What wrong with my system? Anyong came across the same problem?

11177 · May 9, 2018, 2:17am

Please help me! Thanks!

11177 · May 9, 2018, 2:20am

My cudnn version are as follows:

cat cuda/include/cudnn.h | grep CUDNN_MAJOR
#define CUDNN_MAJOR 7
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

11177 · May 9, 2018, 2:21am

I installed the cudnn in a remote linux server withou sudo

ptrblck · May 9, 2018, 7:33am

Have you checked, if your GPU is indeed out of memory e.g. with nvidia-smi?

If that’s not the case, a restart of your machine often solved this issue the last couple of times. Could you try that?

11177 · May 10, 2018, 3:22am

Thank you anyway! I fixed this issue by reinstalling pytorch 0.4