[resolved] Cuda Runtime Error(30)

ycszen · March 16, 2017, 12:50pm

When I run the code torch.cuda.is_available(), I meet the error as below:

THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=109 error=30 : unknown error
Traceback (most recent call last):
  File "trainer.py", line 13, in <module>
    if torch.cuda.is_available():
  File "/usr/local/lib/python2.7/dist-packages/torch/cuda/__init__.py", line 30, in is_available
    return torch._C._cuda_getDeviceCount() > 0
RuntimeError: cuda runtime error (30) : unknown error at torch/csrc/cuda/Module.cpp:109

apaszke · March 16, 2017, 3:36pm

There must be something wrong with your driver. Maybe try rebooting?

ycszen · March 17, 2017, 8:26am

OK. I have found the problem. After I update the linux system, the driver become useless. So I will reinstall the driver. Thank you for your reply.

chrisranderson · May 18, 2017, 10:37pm

I get this when I put my laptop to sleep while in the middle of training. When I put it to sleep, my script stops and I get this error:

THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THC/generated/../THCReduceAll.cuh line=334 error=4 : unspecified launch failure
Traceback (most recent call last):
  File "trytry.py", line 111, in <module>
    loss = network.loss(prediction, label_batch) + 10*torch.mean(cheat_amount)
  File "trytry.py", line 73, in loss
    union = 1e-5 + prediction.sum() + label.sum()
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/variable.py", line 437, in sum
    return Sum(dim)(self)
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/_functions/reduce.py", line 16, in forward
    return input.new((fn(),))
RuntimeError: cuda runtime error (4) : unspecified launch failure at /b/wheel/pytorch-src/torch/lib/THC/generated/../THCReduceAll.cuh:334

And afterward I get this:

THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THC/THCGeneral.c line=66 error=30 : unknown error
Traceback (most recent call last):
  File "trytry.py", line 77, in <module>
    network = Net()
  File "trytry.py", line 57, in __init__
    self.squeezenet = models.squeezenet1_1(pretrained=True).features.cuda() 
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 147, in cuda
    return self._apply(lambda t: t.cuda(device_id))
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 118, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 124, in _apply
    param.data = fn(param.data)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 147, in <lambda>
    return self._apply(lambda t: t.cuda(device_id))
  File "/usr/local/lib/python3.5/dist-packages/torch/_utils.py", line 65, in _cuda
    return new_type(self.size()).copy_(self, async)
  File "/usr/local/lib/python3.5/dist-packages/torch/cuda/__init__.py", line 272, in __new__
    _lazy_init()
  File "/usr/local/lib/python3.5/dist-packages/torch/cuda/__init__.py", line 85, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (30) : unknown error at /b/wheel/pytorch-src/torch/lib/THC/THCGeneral.c:66

A reboot has fixed the problem. This is with CUDA 8.0 and an nvidia 1060.

kindlychung · August 20, 2017, 5:24pm

I also have the same issue after the laptop wakes up. I think this is a bug. Tensorflow seems to work fine in such situations.

arogozhnikov · September 12, 2017, 3:56pm

Just had the same failure after wake/sleep for desktop. Pytorch 0.2, ubuntu 16.04

hvasbath · November 9, 2017, 9:34pm

Same here! GTX1050ti Ubuntu16.04, reboot fixes it, but one short sleep then wake breaks it!

psavine42 · December 7, 2017, 11:16pm

Same here. ubuntu 16.04 cuda9, pytorch 0.2

Ke_Bai · February 19, 2018, 3:31pm

Have anyone solved this problem? Thanks.

danakianfar · March 6, 2018, 2:59pm

This also happens on my system

Ubuntu 16.04
Nvidia GeForce 940MX
PyTorch 0.3.1 running on Python 3.6
Cuda 8.0
CUDNN 7

Any clues? I don’t see why this thread is marked as resolved, if the solution is to restart the laptop every time.

Ste_Millington · March 22, 2018, 11:55pm

Same problem for me too

Ubuntu 16.04 running on Dell desktop
Nvidia GeForce 1050ti
PyTorch 0.3.1.post2 running on Python 3.6
Cuda 9.1
CUDNN 7.1

Adam_Harrison · March 28, 2018, 7:59pm

Run into the same problem

Ubuntu 16.04
TitanXp
Cuda 9.1
pytorch 0.3.1 running python 2.7

Jimmy_Xiaoke_Shen · April 21, 2018, 6:14pm

reboot fixes the problem.

Marat · May 6, 2018, 6:47pm

Same problem some strange stuff after wake up (desktop ubuntu 16.04 cuda 8 1080 gtx)

happytaoxiaoli · July 9, 2018, 2:22am

maybe use sudo can solve this problem.
i reinstall my driver and cuda after my linux system updated, and same problem happened

waleeka · August 25, 2018, 7:17am

Same problem here. After laptop goes to sleep and wake up, I get this error after calling torch.cuda.current_device():

RuntimeError: cuda runtime error (30) : unknown error at 
/opt/conda/conda-bld/pytorch_1524590031827/work/aten/src/THC/THCGeneral.cpp:70

Ubuntu 18.04, Pytorch 0.4.1, cuda 9.2

farhat_Ullah · August 29, 2018, 12:28pm

I have hp-1000 laptop without GPU. Now how it is possible to handle this error.
"THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=32 error=30 : unknown error
Traceback (most recent call last): File “train_nli.py”, line 62, in
torch.cuda.set_device(params.gpu_id)
File “/home/farhatullah/anaconda3/lib/python3.6/site-packages/torch/cuda/init.py”, line 262, in set_device torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (30) : unknown error at torch/csrc/cuda/Module.cpp:32
"
actually I am training InferSent sentence embedding model. There is any availability of https://github.com/facebookresearch/InferSent/blob/master/train_nli.py this code for CPU??

miladiouss · March 7, 2019, 10:01am

The solution can be found here. Basically, run the following commands in the terminal:

sudo rmmod nvidia_uvm
sudo rmmod nvidia
sudo modprobe nvidia
sudo modprobe nvidia_uvm

Mohamed_Ghadban · April 10, 2019, 7:12am

Go to NVIDIA Nsight Options and set ‘Enable Crash Detection And Handling = True’.

Did the trick for me.

AndreiCostinescu · April 16, 2019, 4:08pm

This always works for me (Win10, Cuda 10.1, Python 3.7.2, PyTorch 1.0.1, NVIDIA GTX 1050 Ti):

import torch
torch.cuda.current_device()

but this always fails for me:

import torch
torch.cuda.is_available()
torch.cuda.current_device()  # fails here

@Mohamed_Ghadban, how can I access the NVIDIA Nsight Options? Thanks in advance