[SOLVED]Error when initializing GPU

Whenever I try to initialize my GPU in PyTorch, I receive the following error:


THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/THCTensorRandom.cu line=25 error=30 : unknown error
Traceback (most recent call last):
File “”, line 1, in
File “/home/local.jmatthews/anaconda3/lib/python3.6/site-packages/torch/cuda/init.py”, line 143, in init
File “/home/local.jmatthews/anaconda3/lib/python3.6/site-packages/torch/cuda/init.py”, line 161, in _lazy_init
RuntimeError: cuda runtime error (30) : unknown error at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/THCTensorRandom.cu:25

I’ve restarted my server with no luck.

| NVIDIA-SMI 390.30 Driver Version: 390.30 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 GeForce GTX 1070 Off | 00000000:03:00.0 Off | N/A |
| 0% 40C P5 18W / 151W | 0MiB / 8114MiB | 2% Default |

| Processes: GPU Memory |
| GPU PID Type Process name Usage |
| No running processes found |

$ nvcc --version
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

I’m running CentOS.

Any idea what’s going on?

I got very similar error and I found the reason is I am using a GPU with wrong id. One easy way to check whether it is the problem, you can try to simply create a tensor on that GPU, e.g., a = torch.tensor([1., 2.], device=torch.device('cuda: id'). I solve this by using a correct GPU id.

Turns out that someone installed Cuda 9.1 on the system that I was working on. I had PyTorch for Cuda 8.0 installed. I installed PyTorch for Cuda 9.1 and it worked.