RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at ..\aten\src\THC\THCGeneral.cpp:50

poojavinod100 · January 14, 2020, 8:20am

I was trying to run the extractive summarizer of the BERTSUM program(https://github.com/nlpyang/PreSumm/tree/master/src) in test mode with the following command:

python train.py -task ext -mode test -batch_size 3000 -test_batch_size 500 -bert_data_path C:\Users\hp\Downloads\PreSumm-master\PreSumm-master\bert_data -log_file ../logs/val_abs_bert_cnndm -model_path C:\Users\hp\Downloads\bertext_cnndm_transformer -test_from C:\Users\hp\Downloads\bertext_cnndm_transformer\model_1.pt -sep_optim true -use_interval true -visible_gpus 1 -max_pos 512 -max_length 200 -alpha 0.95 -min_length 50 -result_path ../logs/abs_bert_cnndm

Here is the error log:

[2020-01-13 21:03:01,681 INFO] loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin from cache at ../temp\aa1ef1aede4482d0dbcd4d52baad8ae300e60902e88fcb0bebdec09afd232066.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157
driver version : 10020
THCudaCheck FAIL file=..\aten\src\THC\THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected
Traceback (most recent call last):
  File "train.py", line 156, in <module>
    test_ext(args, device_id, cp, step)
  File "C:\Users\hp\Downloads\PreSumm-master\PreSumm-master\src\train_extractive.py", line 190, in test_ext
    model = ExtSummarizer(args, device, checkpoint)
  File "C:\Users\hp\Downloads\PreSumm-master\PreSumm-master\src\models\model_builder.py", line 168, in __init__
    self.to(device)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 426, in to
    return self._apply(convert)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 202, in _apply
    module._apply(fn)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 202, in _apply
    module._apply(fn)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 202, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 224, in _apply
    param_applied = fn(param)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 424, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\cuda\__init__.py", line 194, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at ..\aten\src\THC\THCGeneral.cpp:50

I am sure that I have a CUDA-enabled GPU, I made sure by checking the list on NVIDIA. This is an NVIDIA GeForce GTX 950M. I have also used my GPU for deep learning projects with CUDA before. I have installed CUDA and cudNN following these instructions, thinking that could be the problem:https://www.easy-tensorflow.com/tf-tutorials/install/cuda-cudnn(latest versions, CUDA 10.2). I also tried adding os.environ[‘CUDA_VISIBLE_DEVICES’]=‘0’ in train.py(as this worked for people facing the same kind of error from help posts online). But still the error persists.

I’d really appreciate if someone could help me figure this out.

ptrblck · January 14, 2020, 11:18pm

How did you install PyTorch?
If you’ve installed the binaries, note that they will ship with their own CUDA, cudnn etc., so you don’t need to install these libraries locally unless you want to build from source or CUDA extensions.

Could you post the install log, please?
Were you able to use PyTorch with this GPU before?

poojavinod100 · January 15, 2020, 7:15pm

@ptrblck, thank you for the response. I remember I had installed PyTorch with conda. Around that time, I had done a pip install for a different version of torch. But ‘conda list torch’ gives me the current global version as 1.3.0.

Also, ‘conda list cuda’ returns this:

# packages in environment at C:\Users\hp\Anaconda3:
#
# Name                    Version                   Build  Channel
cuda100                   1.0                           0    pytorch
cudatoolkit               10.1.168                      0

I also recently installed CUDA 10.2, and cudnn with it globally.

I have used torch on my system for DL before. But I am confused now, as to whether all of the training in my DL projects was actually being done by CPU and not my GPU(I know it sounds really funny, but I seemed sure it was my GPU, because the training was quite quick). Is there any way I can check my GPU history?

When I went through my previous jupyter notebooks(not run from any virtual env), I found this:

And then I ran this code on my system yesterday:

import torch 
print(torch.__version__)

print(torch.cuda.current_device())
print(torch.cuda.device(0))
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(0))
print(torch.cuda.is_available())
print(torch.cuda.current_device())

Output:

1.3.0
driver version : 10020
0
<torch.cuda.device object at 0x0000028603786FD0>
1
GeForce GTX 950M
True
0

This seems contradicting to me(how the jupyter notebook says that CUDA is not available, and how the cmd program says that CUDA is available).

ptrblck · January 16, 2020, 2:41am

I guess the Python kernel / environment in your Jupyter notebook might not be the same you are using in your terminal by running python script.py.
Could create a (new) conda environment and execute the notebook as well as run your script in this env?

wiTTyMinds_Technolog · April 7, 2020, 11:19am

Hi @ptrblck, I am trying to run a video feature script without using CUDA.
But,i am also getting the similar error. I don’t have CUDA enabled device .
I tried disabling it using export CUDA_VISIBLE_DEVICES="".
But, still i am getting this error:

File “/home/penguin/conda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 223, in _apply
param_applied = fn(param)
File “/home/penguin/conda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 304, in
return self._apply(lambda t: t.cuda(device))
File “/home/penguin/conda/lib/python3.7/site-packages/torch/cuda/init.py”, line 197, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50

I have installed pytorch using pip .

Output of import torch

print(torch.version.cuda)

print(torch.cuda.device_count())

print(torch.cuda.is_available())

10.1

0

False

Please Help.

ptrblck · April 8, 2020, 2:35am

It seems that somewhere in your code you are trying to push a module to the GPU.
Could you check for all model.cuda() and model.to('cuda') calls and remove them?
If you are using the to() approach, you could also write device-agnostic code by using:

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)

zahra · April 19, 2020, 6:22pm

Hi Ptrblck
I implement my codes on colab and I received the same error. If I removed model.to(device), how is it possible to run my program on gpu?

ptrblck · April 19, 2020, 11:10pm

It won’t be possible to run the code on the GPU after removing all to('cuda') and cuda() calls.
@wiTTyMinds_Technolog got the error, since he does not have a GPU and somewhere in the script the tensors or model were still being pushed to the device.

The original error message points to a CUDA call, which wasn’t compiled for the current GPU architecture.
Are you running some custom CUDA extensions, did you build PyTorch from source, and which GPU are you using?

zahra · April 23, 2020, 8:05pm

Thanks.

Are you running some custom CUDA extensions? No, I am using https://colab.research.google.com/
did you build PyTorch from source? No, I am using its modules and functions.
which GPU are you using? I am using https://colab.research.google.com/ and I do not know what is it.

I found what caused this problem:

In colab, I should set that I want to run my code on gpu in Runtime menu.

Thanks a lot

Imanuel_Roz · October 17, 2021, 10:17pm

Hi I am running a model using the GPU of Google Colab and I am getting this error:

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=51 error=38 : no CUDA-capable device is detected
Traceback (most recent call last):
  File "run.py", line 338, in <module>
    main()
  File "run.py", line 303, in main
    model = model.cuda()
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 260, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 193, in _apply
    param.data = fn(param.data)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 260, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/usr/local/lib/python3.7/dist-packages/torch/cuda/__init__.py", line 162, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:51```
I checked and my GPU is active and cuda is available, I also tried different solution proposed here but didn't solved my problem.

ptrblck · October 18, 2021, 12:47am

What does torch.cuda.get_device_name(0) return?

Imanuel_Roz · October 18, 2021, 8:37am

Hi it gives me ‘Tesla K80’

ptrblck · October 18, 2021, 8:46am

Thanks for the update.
I cannot reproduce the issue in a new notebook on a K80 in Colab:

and can properly use the device.

Imanuel_Roz · October 20, 2021, 4:31pm

I fixed it, I changed CUDA_VISIBLE_DEVICES = 5 to CUDA_VISIBLE_DEVICES = 0 in the python file and it works.
Thank you for your rapid response.