So, I have the same exact error.
I am running this code from GoodNews repo:
[jalal@goku GoodNews]$ python train.py --cnn_weight data/resnet152-b121ed2d.pth
DataLoader loading json file: data/data_news.json
vocab size is 37200
DataLoader loading h5 file: data/data_news_label.h5 /scratch2/goodnewsdata/data_news_image.h5
read 489229 images of size 3x256x256
max sequence length in data is 31
assigned 445433 images to split train
assigned 19376 images to split val
assigned 24420 images to split test
WARNING:tensorflow:From train.py:49: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch-nightly_1553749772122/work/aten/src/THC/THCGeneral.cpp line=51 error=3 : initialization error
Traceback (most recent call last):
File "train.py", line 280, in <module>
train(opt)
File "train.py", line 81, in train
cnn_model.cuda()
File "/scratch/sjn-p3/anaconda/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 263, in cuda
return self._apply(lambda t: t.cuda(device))
File "/scratch/sjn-p3/anaconda/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 190, in _apply
module._apply(fn)
File "/scratch/sjn-p3/anaconda/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 196, in _apply
param.data = fn(param.data)
File "/scratch/sjn-p3/anaconda/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 263, in <lambda>
return self._apply(lambda t: t.cuda(device))
File "/scratch/sjn-p3/anaconda/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 163, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (3) : initialization error at /opt/conda/conda-bld/pytorch-nightly_1553749772122/work/aten/src/THC/THCGeneral.cpp:51
Closing remaining open files:data/data_news_label.h5...done/scratch2/goodnewsdata/data_news_image.h5...done
These files have the keyword ‘multiprocessing’:
[jalal@goku GoodNews]$ rg 'multiprocessing'
dataloader.py
11:# import multiprocessing
12:# from multiprocessing import Process
13:# from multiprocessing.dummy import Pool as ThreadPool
14:# from pathos.multiprocessing import Pool
get_data/get_imgs_only/get_images.py
12:# from multiprocessing.dummy import Pool as ThreadPool
get_data/with_article_urls/get_data_with_urls.py
14:# from multiprocessing.dummy import Pool as ThreadPool
get_data/with_api/get_data_api.py
9:import multiprocessing
15:from multiprocessing.dummy import Pool as ThreadPool
However, I am not sure how to proceed. Could you please have a look at the files in the repo?
I have:
$ nvidia-smi
Fri Sep 18 23:25:50 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:05:00.0 Off | N/A |
| 0% 24C P8 19W / 250W | 55MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:06:00.0 Off | N/A |
| 0% 27C P8 12W / 250W | 2MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2637 G /usr/bin/X 39MiB |
| 0 N/A N/A 2903 G /usr/bin/gnome-shell 12MiB |
+-----------------------------------------------------------------------------+
and
$ python
Python 3.6.7 | packaged by conda-forge | (default, Nov 6 2019, 16:19:42)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.0.0.dev20190328'
>>> torch.version.cuda
'8.0.61'
>>> torch.cuda.is_available()
False
>>> torch.backends.cudnn.enabled
True
$ cat /usr/local/cuda/version.txt
CUDA Version 10.0.130
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
I am not sure how to fix the problem by looking at https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility__table-toolkit-driver
More information here https://stackoverflow.com/questions/63965122/fixing-torch-cuda-is-available-when-it-is-false