Warning for old GPU and cuda runtime error with fast.ai lesson

shaun · February 21, 2018, 12:28am

Hello,

I started working on fast.ai lessons and ran into some problems. I’m running the code locally on my computer. Here are my computer specs:

64-bit Ubuntu 16.04
GTX 770 with 2GB RAM

I’ve installed 64-bit Anaconda3 with Python 3.6.4 and used conda to install pytorch:

conda install pytorch torchvision cuda90 -c pytorch

My cuda version is 9.0:

~ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

I have cudnn version 7 installed. When I run through the first lesson where I call the training part, I get a warning for old GPU:

/home/username/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/cuda/init.py:97: UserWarning:
Found GPU0 GeForce GTX 770 which is of cuda capability 3.0.
PyTorch no longer supports this GPU because it is too old.

warnings.warn(old_gpu_warn % (d, name, major, capability[1]))

followed by an runtime cuda error:

RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCTensorMath.cu:15

The error message is long and the other stuff is just traceback so I didn’t include it. Its been a while since I ran pytorch and when I ran it before I did not get this error/warning. Is my GPU too old for the current version? Any suggestions to fix this error?

Thanks.

smth · February 21, 2018, 4:24am

For now, we suggest source installs of PyTorch that will support your GPU: https://github.com/pytorch/pytorch#from-source

Your GPU is old enough that we are no longer committing to maintain PyTorch support for it.
For the near future, it should continue working.

shaun · February 21, 2018, 10:42am

@smth Thanks. I installed pytorch from source following those instructions.

I created a new conda environment and tried to import torch and got a module not found error. Importing works on the “base” environment though. I’m guessing this is some path issue. Any suggestions for a fix?

Currently, the pytorch directory is located at ~/anaconda3/pytorch. As I mentioned earlier, in the base environment I am able to import torch. I a new conda environment, I added the pytorch path

import sys; sys.path.append(‘/home/username/anaoncda3/pytorch’)

However, when I import torch now, I get the following error:

Traceback (most recent call last):
File “”, line 1, in
File “/home/username/anaconda3/pytorch/torch/init.py”, line 77, in
from torch._C import *
ModuleNotFoundError: No module named ‘torch._C’

I made sure that I’m not in the pytorch source directory as this seemed to be a problem (as per here). Still no success.

mh_wasil · February 27, 2018, 1:08am

I have the same problem and I tried to install PyTorch from the source. I am able to import PyTorch but the cuda runtime error is still there.

RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCTensorMath.cu:15

my computer specs:
64-bit Ubuntu 16.04
GTX 950M with 2GB RAM
Anaconda3 with python3.5

smth · February 27, 2018, 8:58am

@mh_wasil you first have to uninstall your pytorch binary install before installing from source. As your stack-trace reflects the path /opt/conda/conda-bld/pytorch_1518244421288/work it means that you are still using binary install.

adamwespiser · February 27, 2018, 8:59pm

Hi Shaun,
I’m doing the fast.ai course now, and using GCE. In my experience, the fast.ai code is still very much ‘research code’, and not hardened enough to very well outside of the specific problem, dataset, and algorithm used by fast.ai. I’ve been pretty frustrated by it! If you dig deeper into the fast.ai codebase, its not that hard to see its just a few wrappers for dataset loading and some cyclical learning rate stuff. Good luck!

mmcm · March 7, 2018, 8:32pm

Because this is one of the first results when googling this error:

for me, this happened after I compiled from source on my local machine, then tried to run my code on another machine (compute cluster) with a different GPU architecture and compute capability.

I am currently compiling pytorch again, this time with TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0+PTX" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" (taken from the Dockerfile) to include compute capabilities not present on the machine compiling pytorch. Fingers crossed that helps

Houjing_Huang · January 21, 2019, 4:17am

I have the same use case. Your solution is awesome!