Help why torch.cuda.is_available return True but my GPU didn't work

Cuda driver is the 11.2 and the cudatoolkit is 11.0 and after import torch torch.cuda.is_available return True

How did you check that the GPU isn’t working?
What does nvidia-smi show after you push the model and inputs to the GPU and execute a forward pass?

Similar issues.

PyTorch version: 1.7.1+cu110
Cuda version: NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2
(This is the latest beta driver for Ubuntu; needed to fix persistent driver crash.)
torch.cuda.is_available(): True
When running a model: RuntimeError: CUDA error: no kernel image is available for execution on the device

Now, I am a PyTorch/Cuda newbie, so user error is a real possibility.

What are the compatibility rules for Cuda/PyTorch? Should the Cuda 11.0 PyTorch build work with 11.2? A doc on the Nvidia web site suggests it should, FWIW. Or, should I rebuild from sources as suggested for other instances of this issue?

Your local CUDA installation won’t be used, if you are installing the binaries, which ship with their own CUDA runtime. You would thus only need the appropriate driver.

The error message is raised, if you are using a build, which doesn’t support your GPU architecture, so which GPU are you using?

well , i have used a RTX2070s and after I install back my cuda version to 10.2 and pytorch version to 1.6 it just works so I think the newest version don’t have good compatibility

I cannot reproduce this error using the 1.7.1 binaries with CUDA11.0 as well as CUDA10.2 and this code snippet on an RTX2070:

import torch

print(torch.cuda.get_device_name())
print(torch.__version__)
print(torch.version.cuda)
x = torch.randn(1).cuda()
print(x)

Result:

GeForce RTX 2070
1.7.1
11.0
tensor([1.7284], device='cuda:0')
...
GeForce RTX 2070
1.7.1
10.2
tensor([0.8304], device='cuda:0')

so the binaries should contain the needed architecture code.

@ptrblck, thanks much for your response. You asked about my GPU:

  In: torch.cuda.get_device_name()
  Out: GeForce GT 710

Found this link to supported Cuda products; the GT 710 is not listed. Yet, the product box claims Cuda support, nvidia-smi gives the info listed earlier and the Nvidia UI claims it has 192 Cuda cores.

Maybe I was a bit too cheap in getting the lowest-cost GPU that supports both a 4K screen and (supposedly) Cuda…

So, the first step is to verify that the GT 710 does support Cuda. Is there a non-PyTorch way to do that?

If that checks out, I can try upgrading to the just-released 460 driver. (Upgrading Nvidia drivers on Linux Mint is a nightmare.)

1 Like

Your GT710 supports CUDA and uses a compute capability of 3.5.
The PyTorch binaries are built for a compute capability >= 3.7 as seen here, so you would need to build PyTorch from source as described here.

1 Like

@ptrblck, thanks for the quick answer. I’ll look into doing the compile as you suggest.

I wonder, does it make sense to add a “cudaAudit()” function that could do all these checks and print out problems? Or, is there one already?

You can check the built-in compute capabilities in the currently used binary via:

print(torch.cuda.get_arch_list())

My goal is to learn PyTorch, hence my attempt to use my existing low-end GPU card to get started. For any others who might find this: a bit of reading suggests that was a naive idea.

I’ll try “plan B”: use the CPU and offload to the cloud for occasional heavy lifting until I’m ready to invest in a proper high-end card for serious work.

I don’t think learning PyTorch would be bottlenecked by your “low-end” GPU and you could certainly learn all basic concepts. Of course you won’t be able to train huge models on large datasets, but it also depends what exactly you try to learn first.

2 Likes

facing the same issue

GPU:1080Ti
Cuda: 10.1 (nvcc -V)

tried pytorch 1.6, 1.7.0+cu101, 1.7.1+cu101
The same settings were working previously but after NVIDIA driver update to 460.39 it stopped working.

Do you see an error while running your script? If so, could you post the error message including the complete stack trace?

No there’s no error, the program returns true for:

torch.cuda.is_available()

but runs the code on cpu. GPU usage remains ~0% on nvidia-smi

If you are transferring the data to the GPU via model.cuda() or model.to('cuda'), the GPU will be used. Otherwise an error would be raised.
A low GPU utilization might come from different bottlenecks in your code, e.g. the data loading.
As a quick check you could run a simple matrix multiplication on the GPU in a loop and should see some GPU utilization in nvidia-smi.

model.to('cuda') is being used for that. I’ll check with the matmul operation.

I think I’ve found the issue, you’re right its different bottlenecks in the code. Thanks!

I am facing the same issue, torch.cuda.is_available() returns TRUE, torch.cuda.get_device_name(torch.cuda.current_device()) returns NVIDIA GeForce GTX 1060 but code is running on the CPU.

Did you push the data and model to the GPU? If so and if you haven’t received an error, the GPU will be used. In case the utilization is low, you could profile you code and check where the bottleneck might be (e.g. maybe in the data loading).