.cuda() freeze on ubuntu system

Zhiqiu_Lin · April 5, 2018, 3:46pm

I installed pytorch/cuda using conda, and initially it works well. However, recently I discovered that my calls to cuda() will hang and freeze. Something simple as torch.randn(10).cuda() will freeze it as well, and even ctrl+c/+z won’t terminate it.

My system has cuda80 and I make sure I download corresponding version of pytorch that is compatible with cuda80, and here is the output of my “conda list | grep pytorch”:

cuda80 1.0 h205658b_0 pytorch
pytorch 0.3.1 py36_cuda8.0.61_cudnn7.0.5_2 pytorch
torchvision 0.2.0 py36h17b6947_1 pytorch

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 28553 C …d: celery@phoenix28-embed#dev1:Worker-45] 157MiB |
±----------------------------------------------------------------------------+

But I am not even sure if this is the issue… Could someone help me pinpoint where the issue is?

SimonW · April 6, 2018, 5:07am

Something is using using all of your GPU 0.

Zhiqiu_Lin · April 6, 2018, 6:21am

Thanks! I am very new to Pytorch and I was wondering in this case, can I do anything to utilize the second GPU?

SimonW · April 6, 2018, 6:53am

Yeah, .cuda takes in a device id, so .cuda(1) should work.

Derek · April 6, 2018, 7:03am

CUDA_VISIBLE_DEVICES=1 python *.py
This can run your code on your second GPU.
Here 1 is the id of the second GPU.

Zhiqiu_Lin · April 6, 2018, 1:39pm

Thanks it works! Can we also use nn.DataParallel so that we no longer need to worry about one GPU being occupied?

SimonW · April 6, 2018, 6:41pm

You can specify the device ids used in nn.DataParallel with the device_ids argument. Although running DataParallel on only one device doesn’t help…