Hi KFrank,
Thanks for your message. Let me elaborate more on what I have done.
For my particular gpu, I was able to get the gpu and cuda to work together (as evident from calling the deviceQuery binary installed whenever I installed cuda). This gpu, which is a cuda capability 2.0, works with cuda 8.
The most recent version of pytorch will not work with this gpu. So I went to the pytorch website and looked for previous versions of pytorch that will work with cuda 8 (those versions can be found here) and that turned out to be version 1.0.0. I got that installed with
pip install torch==1.0.0 torchvision==0.2.1
then tested pytorch to see if it could find and identify my gpu:
Python 3.6.8 (default, Oct 7 2019, 12:59:55)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.current_device()
/home/user/.local/lib/python3.6/site-packages/torch/cuda/__init__.py:117: UserWarning:
Found GPU0 Quadro 4000 which is of cuda capability 2.0.
PyTorch no longer supports this GPU because it is too old.
warnings.warn(old_gpu_warn % (d, name, major, capability[1]))
0
>>> torch.cuda.device(0)
<torch.cuda.device object at 0x7fb833751d30>
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name(0)
'Quadro 4000'
>>> torch.cuda.is_available()
True
At this point, I am assuming that cuda, my gpu, and this older version of pytorch are all playing nicely. However, when I go to train my model I get that RuntimeError I posted originally (copied below):
RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device at /pytorch/aten/src/THC/generic/THCTensorMath.cu:238
Here is the full output of the error, if that would be more useful:
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=238 error=48 : no kernel image is available for execution on the device
Traceback (most recent call last):
File "run_lm_finetuning.py", line 548, in <module>
main()
File "run_lm_finetuning.py", line 500, in main
global_step, tr_loss = train(args, train_dataset, model, tokenizer)
File "run_lm_finetuning.py", line 206, in train
output_device=args.local_rank)
File "/home/user/.local/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 215, in __init__
self.broadcast_bucket_size)
File "/home/user/.local/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 377, in _dist_broadcast_coalesced
dist._dist_broadcast_coalesced(self.process_group, tensors, buffer_size, False)
RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device at /pytorch/aten/src/THC/generic/THCTensorMath.cu:238
EDIT:
One last bit of information that might be useful. I am calling the python script as follows:
python -m torch.distributed.launch run_lm_finetuning.py