.cuda() Problem - I can't transfer objects to the GPU

Bruno_Oliveira · October 11, 2019, 8:30pm

Hey everyone.

Recently I have formatted my SSD and fresh installed Ubuntu 18.04. As usual, I reinstalled my deep learning setup, but I ran into trouble this time. After reviewing a code for some time, I tried to run it and Python seemed unresponsive. After investigating the issue for a while, I realized this unresponsiveness happens when I try to move anything to my GPU with .cuda(). It seems not to matter if I’m trying to move a complex model or a simple torch.randn(10) tensor, the issue occurs equally. When I try to do it, I have to quit Python with Ctrl+Z, as it seems to be stuck indefinitely. I’m sharing this hoping to get some info on what might be happening.

bruno@vox:~$ nvidia-smi
Fri Oct 11 17:28:03 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
| 23%   51C    P2    39W / 180W |   1273MiB /  8118MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1083      G   /usr/lib/xorg/Xorg                           246MiB |
|    0      1309      G   /usr/bin/gnome-shell                         149MiB |
|    0      2184      C   python                                       121MiB |
|    0      2293      G   ...uest-channel-token=15138661163890541212    56MiB |
|    0      2581      G   ...quest-channel-token=7637078073055189632   120MiB |
|    0      3024      C   python                                        93MiB |
|    0      3078      C   python                                        99MiB |
|    0      3110      C   python                                        81MiB |
|    0      3215      C   python                                       103MiB |
|    0      3846      C   python                                        97MiB |
|    0      8113      C   python                                       103MiB |
+-----------------------------------------------------------------------------+

bruno@vox:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

I was running PyTorch 1.2 after the fresh Ubuntu install and upgraded to 1.3 after I realized the issue, but it didn’t help.

Bruno_Oliveira · October 11, 2019, 9:38pm

I have downgraded CUDA to 10.0 and it seems to work now. I don’t think this problem should have happened with CUDA 10.1 though, so it might be beneficial taking a look on what happened.

smth · October 12, 2019, 11:19am

looking into it, tracking in https://github.com/pytorch/pytorch/issues/27807

Dmitry_Antonov · July 14, 2021, 11:17am

The same problem with CUDA Version: 11.3