fqz
April 27, 2018, 3:40pm
1
Until now I was using pytorch 0.3.1 on Linux with a GPU and CUDA 9.1. Everything works great.
I just installed pytorch 0.4 in a new conda env on the same machine, but now torch.cuda.is_available() returns False. When I go back to the pytorch 0.3.1 env, it returns True. I also installed pytorch 0.3.1 in a third env, and I get False again.
Any pointers?
fqz
April 30, 2018, 6:43pm
3
I resolved this by installing with CUDA 9.0, not 9.1
Same issue here. I installed pytorch 0.4 by conda on a ubuntu 17.10 machine. Tensorflow seems to work fine there.
In [2]: torch.__version__
Out[2]: '0.4.0'
In [3]: torch.device("cuda")
Out[3]: device(type='cuda')
In [4]: cuda = torch.device("cuda")
In [5]: torch.tensor([[1], [2], [3]], dtype=torch.half, device=cuda)
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/THCGeneral.cpp line=70 error=30 : unknown error
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-5-6f6a0c1a1f8d> in <module>()
----> 1 torch.tensor([[1], [2], [3]], dtype=torch.half, device=cuda)
~/miniconda3/envs/torch/lib/python3.6/site-packages/torch/cuda/__init__.py in _lazy_init()
159 "Cannot re-initialize CUDA in forked subprocess. " + msg)
160 _check_driver()
--> 161 torch._C._cuda_init()
162 _cudart = _load_cudart()
163 _cudart.cudaGetErrorName.restype = ctypes.c_char_p
RuntimeError: cuda runtime error (30) : unknown error at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/THCGeneral.cpp:70
In [6]: torch.cuda.is_available()
Out[6]: False
Fixed by remove then install again the nvidia driver:
sudo apt remove nvidia-384
sudo apt install nvidia-384
You might need to restart the machine.
1 Like