Help why torch.cuda.is_available return True but my GPU didn't work

yutianfanxing · December 21, 2020, 3:02am

Cuda driver is the 11.2 and the cudatoolkit is 11.0 and after import torch torch.cuda.is_available return True

ptrblck · January 2, 2021, 5:12am

How did you check that the GPU isn’t working?
What does nvidia-smi show after you push the model and inputs to the GPU and execute a forward pass?

progers · January 8, 2021, 1:05am

Similar issues.

PyTorch version: 1.7.1+cu110
Cuda version: NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2
(This is the latest beta driver for Ubuntu; needed to fix persistent driver crash.)
torch.cuda.is_available(): True
When running a model: RuntimeError: CUDA error: no kernel image is available for execution on the device

Now, I am a PyTorch/Cuda newbie, so user error is a real possibility.

What are the compatibility rules for Cuda/PyTorch? Should the Cuda 11.0 PyTorch build work with 11.2? A doc on the Nvidia web site suggests it should, FWIW. Or, should I rebuild from sources as suggested for other instances of this issue?

ptrblck · January 8, 2021, 7:59am

Your local CUDA installation won’t be used, if you are installing the binaries, which ship with their own CUDA runtime. You would thus only need the appropriate driver.

The error message is raised, if you are using a build, which doesn’t support your GPU architecture, so which GPU are you using?

yutianfanxing · January 8, 2021, 9:14am

well ， i have used a RTX2070s and after I install back my cuda version to 10.2 and pytorch version to 1.6 it just works so I think the newest version don’t have good compatibility

ptrblck · January 9, 2021, 12:14am

I cannot reproduce this error using the 1.7.1 binaries with CUDA11.0 as well as CUDA10.2 and this code snippet on an RTX2070:

import torch

print(torch.cuda.get_device_name())
print(torch.__version__)
print(torch.version.cuda)
x = torch.randn(1).cuda()
print(x)

Result:

GeForce RTX 2070
1.7.1
11.0
tensor([1.7284], device='cuda:0')
...
GeForce RTX 2070
1.7.1
10.2
tensor([0.8304], device='cuda:0')

so the binaries should contain the needed architecture code.

progers · January 9, 2021, 5:42am

@ptrblck, thanks much for your response. You asked about my GPU:

  In: torch.cuda.get_device_name()
  Out: GeForce GT 710

Found this link to supported Cuda products; the GT 710 is not listed. Yet, the product box claims Cuda support, nvidia-smi gives the info listed earlier and the Nvidia UI claims it has 192 Cuda cores.

Maybe I was a bit too cheap in getting the lowest-cost GPU that supports both a 4K screen and (supposedly) Cuda…

So, the first step is to verify that the GT 710 does support Cuda. Is there a non-PyTorch way to do that?

If that checks out, I can try upgrading to the just-released 460 driver. (Upgrading Nvidia drivers on Linux Mint is a nightmare.)

ptrblck · January 9, 2021, 5:57am

Your GT710 supports CUDA and uses a compute capability of 3.5.
The PyTorch binaries are built for a compute capability >= 3.7 as seen here, so you would need to build PyTorch from source as described here.

progers · January 9, 2021, 6:02am

@ptrblck, thanks for the quick answer. I’ll look into doing the compile as you suggest.

I wonder, does it make sense to add a “cudaAudit()” function that could do all these checks and print out problems? Or, is there one already?

ptrblck · January 9, 2021, 6:03am

You can check the built-in compute capabilities in the currently used binary via:

print(torch.cuda.get_arch_list())

progers · January 10, 2021, 1:05am

My goal is to learn PyTorch, hence my attempt to use my existing low-end GPU card to get started. For any others who might find this: a bit of reading suggests that was a naive idea.

I’ll try “plan B”: use the CPU and offload to the cloud for occasional heavy lifting until I’m ready to invest in a proper high-end card for serious work.

ptrblck · January 10, 2021, 4:15am

I don’t think learning PyTorch would be bottlenecked by your “low-end” GPU and you could certainly learn all basic concepts. Of course you won’t be able to train huge models on large datasets, but it also depends what exactly you try to learn first.

theAayushbajaj · February 9, 2021, 8:36am

facing the same issue

GPU:1080Ti
Cuda: 10.1 (nvcc -V)

tried pytorch 1.6, 1.7.0+cu101, 1.7.1+cu101
The same settings were working previously but after NVIDIA driver update to 460.39 it stopped working.

ptrblck · February 9, 2021, 8:44am

Do you see an error while running your script? If so, could you post the error message including the complete stack trace?

theAayushbajaj · February 9, 2021, 8:47am

No there’s no error, the program returns true for:

torch.cuda.is_available()

but runs the code on cpu. GPU usage remains ~0% on nvidia-smi

ptrblck · February 9, 2021, 9:00am

If you are transferring the data to the GPU via model.cuda() or model.to('cuda'), the GPU will be used. Otherwise an error would be raised.
A low GPU utilization might come from different bottlenecks in your code, e.g. the data loading.
As a quick check you could run a simple matrix multiplication on the GPU in a loop and should see some GPU utilization in nvidia-smi.

theAayushbajaj · February 9, 2021, 9:03am

model.to('cuda') is being used for that. I’ll check with the matmul operation.

theAayushbajaj · February 9, 2021, 11:56am

I think I’ve found the issue, you’re right its different bottlenecks in the code. Thanks!

Ceyhun_Efe_Kayan · August 17, 2021, 5:22pm

I am facing the same issue, torch.cuda.is_available() returns TRUE, torch.cuda.get_device_name(torch.cuda.current_device()) returns NVIDIA GeForce GTX 1060 but code is running on the CPU.

ptrblck · August 17, 2021, 6:49pm

Did you push the data and model to the GPU? If so and if you haven’t received an error, the GPU will be used. In case the utilization is low, you could profile you code and check where the bottleneck might be (e.g. maybe in the data loading).