Older GPUs don’t seem to support torch in spite of recent cuda versions.
In my case the crash has the following error:
/home/maxs/dev/mdb/venv38/lib/python3.8/site-packages/torch/cuda/__init__.py:83: UserWarning:
Found GPU%d %s which is of cuda capability %d.%d.
PyTorch no longer supports this GPU because it is too old.
The minimum cuda capability supported by this library is %d.%d.
warnings.warn(old_gpu_warn.format(d, name, major, minor, min_arch // 10, min_arch % 10))
WARNING:lightwood-16979:Exception: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1. when training model: <lightwood.model.neural.Neural object at 0x7f9c34df1e80>
Process LearnProcess-1:13:
Traceback (most recent call last):
File "/home/maxs/dev/mdb/venv38/sources/lightwood/lightwood/model/helpers/default_net.py", line 59, in forward
output = self.net(input)
File "/home/maxs/dev/mdb/venv38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/maxs/dev/mdb/venv38/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/maxs/dev/mdb/venv38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/maxs/dev/mdb/venv38/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 96, in forward
return F.linear(input, self.weight, self.bias)
File "/home/maxs/dev/mdb/venv38/lib/python3.8/site-packages/torch/nn/functional.py", line 1847, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
How can I check for an older GPU that doesn’t support torch without actually try/catching a tensor-to-gpu transfer? The transfer initializes cuda, which wastes like 2GB of memory, something I can’t afford since I’d be running this check in dozens of processes, all of which would then waste 2GB of memory extra due to the initialization.
I think the latest cuda version vailable is 11.3, use the command provided in pytorch installation guide https://pytorch.org
It installs automatically pytorch cuda compatible.
Pardon if I misunderstood your request!
Since the answer here might not be very torch-related and instead implicate usage of e.g. nvidia specific tools
According to this comment there is a relationship between the CUDA version and the deprecated compute capabilities you won’t be able to use. You can check the CUDA 10.2 Toolkit Docs where it says which compute capabilities are deprecated.
It might be not be the best solution, but you could do a look-up table where you do a relationship between CUDA versions and deprecated GPUs.
Then, using torch.cuda.get_device_name(torch.cuda.current_device()) you could check if the code should be executed on GPU or CPU.
As far as I know, the only airtight way to check cuda / gpu compatibility
is torch.cuda.is_available() (and to be completely sure, actually
perform a tensor operation on the gpu). That’s what I do on my own
machines (but once I check a that a given version of pytorch works with
my gpu, I don’t have to keep doing it).
You want to check somebody else’s pytorch with somebody else’s gpu,
so I would say it’s doubly important to actually run the gpu.
I would run a separate python process that runs a simple gpu test
script before running your “real” program. (It could store its result in
a text file or environment variable or simply inform the user.) When
that process exits, it will release any cuda overhead.
If you want to get fancy, you could have your “real” program spawn
your test script in a separate process, and proceed with or without
the gpu depending on the test script’s results. Again, when the
spawned process exits, its cuda overhead will be released.
Based on the code in torch.cuda.__init__ that was actually throwing the error the following check seems to work:
import torch
from torch.cuda import device_count, get_device_capability
def is_cuda_compatible():
if torch.version.cuda is not None:
compatible_device_count = 0
for d in range(device_count()):
capability = get_device_capability(d)
major = capability[0]
minor = capability[1]
current_arch = major * 10 + minor
min_arch = min((int(arch.split("_")[1]) for arch in torch.cuda.get_arch_list()), default=35)
if (not current_arch < min_arch
or not torch._C._cuda_getCompiledVersion() <= 9000
or not (major >= 7 and minor >= 5)):
compatible_device_count += 1
return True
return False
Not sure if it’s 100% correct but putting it out here for feedback and in case anybody else needs it. Will be PRing into torch itself later since it seems like the kind of functionality it ought to have.
Based on the code in torch.cuda.__init__ that was actually throwing the error the following check seems to work:
import torch
from torch.cuda import device_count, get_device_capability
def is_cuda_compatible():
compatible_device_count = 0
if torch.version.cuda is not None:
for d in range(device_count()):
capability = get_device_capability(d)
major = capability[0]
minor = capability[1]
current_arch = major * 10 + minor
min_arch = min((int(arch.split("_")[1]) for arch in torch.cuda.get_arch_list()), default=35)
if (not current_arch < min_arch
and not torch._C._cuda_getCompiledVersion() <= 9000):
compatible_device_count += 1
if compatible_device_count > 0:
return True
return False
Not sure if it’s 100% correct but putting it out here for feedback and in case anybody else needs it. Will be PRing into torch itself later since it seems like the kind of functionality it ought to have.