Could you run your code with the CUDA_LAUNCH_BLOCKING=1 env variable and post the new stack trace please.
You can do that by running CUDA_LAUNCH_BLOCKING=1 python your_script.py.
/home/quoniammm/anaconda3/lib/python3.6/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
THCudaCheck FAIL file=/pytorch/torch/lib/THC/THCGeneral.c line=70 error=30 : unknown error
Traceback (most recent call last):
File "HAN.py", line 264, in <module>
word_attn.cuda()
File "/home/quoniammm/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 147, in cuda
return self._apply(lambda t: t.cuda(device_id))
File "/home/quoniammm/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 118, in _apply
module._apply(fn)
File "/home/quoniammm/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 124, in _apply
param.data = fn(param.data)
File "/home/quoniammm/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 147, in <lambda>
return self._apply(lambda t: t.cuda(device_id))
File "/home/quoniammm/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 66, in _cuda
return new_type(self.size()).copy_(self, async)
File "/home/quoniammm/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 266, in _lazy_new
_lazy_init()
File "/home/quoniammm/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 85, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (30) : unknown error at /pytorch/torch/lib/THC/THCGeneral.c:70
The result is as the same as it in notebook.What is the use of CUDA_LAUNCH_BLOCKING=1?
I still feel confused about the cuda runtime error
And then, I upload my program to FloydHub.The error disappeared.What is the reason of it?My nvidia GPU is GTX 1050.I know itâs memory is small.However, The error donât tell me itâs the memory problem. Itâs weird.
CUDA_LAUNCH_BLOCKING make cuda report the error where it actually occurs.
Since the problem is at the cuda initialization function and does not appear on different machine I would guess that your cuda install is not working properly, you may want to reinstall it properly and test it with the cuda samples.
Last time, I just ignore it and run the program in the aws.Today, I run a new program, and I meet the same problem.And I just restart the computer.The problem disappeared. Itâs so weird.Whatâs the reason of it?
Can you explain it?Thanks.
Maybe itâs completely unrelated to your problem, but this week Ubuntu updated the NVidia drivers on my machine (more or less automatically using the software updater) and PyTorch couldnât use cuda anymore. After a restart, the error vanished.
Iâm also seeing this problem. If I have a screen temperature controller (Redshift) on startup, it wonât work. If I watch Netflix, it wonât work. Somewhat related: if I have any variables in CUDA memory in a Jupyter Notebook, I canât run Redshift.
Probably you could use os.environ['CUDA_LAUNCH_BLOCKING'] = 1 at the beginning of your notebook before importing any other library.
If that doesnât work, you could export the notebook as a Python script (.py) and run it in your terminal.
I guess you might be using Windows, which might not be able to use env variables in its default terminal (Iâm just guessing, as Iâm not using Windows)?
If so, you could try to set this env variable in the âSystem Propertiesâ and rerun your script.
Yeah youâre right, but i also with os.environ function, but it was not allowing me to give it int but only string so i passed â1â but the error message stay the sameâŚ
Anyway, even with the default error message I found my error.
For those who still have the error and use BCELoss, make sure values are between 0 and 1.
Hi, I am getting the same error while training my model on Google Colab Pro. How do I resolve the same setting CUDA_LAUNCH_BLOCKiNG as 1 is causing my colab window to crash. It omits the code cell I wrote this command and moves on further in the program.