.cuda() freeze on ubuntu system

I installed pytorch/cuda using conda, and initially it works well. However, recently I discovered that my calls to cuda() will hang and freeze. Something simple as torch.randn(10).cuda() will freeze it as well, and even ctrl+c/+z won’t terminate it.

My system has cuda80 and I make sure I download corresponding version of pytorch that is compatible with cuda80, and here is the output of my “conda list | grep pytorch”:

cuda80 1.0 h205658b_0 pytorch
pytorch 0.3.1 py36_cuda8.0.61_cudnn7.0.5_2 pytorch
torchvision 0.2.0 py36h17b6947_1 pytorch

Also here is the output of “nvidia-smi”:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT… Off | 0000:02:00.0 Off | N/A |
| 38% 79C P0 95W / 250W | 2403MiB / 12206MiB | 100% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX TIT… Off | 0000:03:00.0 Off | N/A |
| 22% 46C P8 17W / 250W | 2002MiB / 12206MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 28553 C …d: celery@phoenix28-embed#dev1:Worker-45] 157MiB |
±----------------------------------------------------------------------------+

But I am not even sure if this is the issue… Could someone help me pinpoint where the issue is?

Something is using using all of your GPU 0.

Thanks! I am very new to Pytorch and I was wondering in this case, can I do anything to utilize the second GPU?

Yeah, .cuda takes in a device id, so .cuda(1) should work.

1 Like

CUDA_VISIBLE_DEVICES=1 python *.py
This can run your code on your second GPU.
Here 1 is the id of the second GPU.

1 Like

Thanks it works! Can we also use nn.DataParallel so that we no longer need to worry about one GPU being occupied?

You can specify the device ids used in nn.DataParallel with the device_ids argument. :slight_smile: Although running DataParallel on only one device doesn’t help…