According to Nvidia official documentation, if CUDA appliation is built to include PTX, because the PTX is forward-compatible, Meaning PTX is supported to run on any GPU with compute capability higher than the compute capability assumed for generation of that PTX. so I try to find whether torch-1.7.0+cu101 is compiled to binary with PTX, and the fact seem like that pytorch actually compiled with nvcc compile flag “-gencode=arch=compute_xx,code=sm_xx” pytorch CMakeLists.txt.I think this flag means after compiling pytorch , the compiled product contains the PTX. However, when I try to use pytorch1.7 with cuda10.1 in A100,there is always error.
>>> import torch
>>> torch.zeros(1).cuda()
/data/miniconda3/lib/python3.7/site-packages/torch/cuda/ UserWarning:
A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the A100-SXM4-40GB GPU with PyTorch, please check the instructions at
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/miniconda3/lib/python3.7/site-packages/torch/", line 179, in __repr__
return torch._tensor_str._str(self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/", line 372, in _str
return _str_intern(self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/", line 352, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/data/miniconda3/lib/python3.7/site-packages/torch/", line 241, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/data/miniconda3/lib/python3.7/site-packages/torch/", line 89, in __init__
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) &
RuntimeError: CUDA error: no kernel image is available for execution on the device
so ,i really want to know,why “PTX compatibilty pricinple” does not apply to pytorch. there are other answers which only tell to use cuda11 or higher ,and i know it works.But they don’t tell me the real reason – why pytorch for cuda10.1 does not work for A100. I try use cuda10.1 samples in toolkit, and these small demo applications acctually work.
[Matrix Multiply Using CUDA] - Starting...
MapSMtoCores for SM 8.0 is undefined. Default to use 64 Cores/SM
GPU Device 0: "A100-SXM4-40GB" with compute capability 8.0
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
Performance= 4286.91 GFlop/s, Time= 0.031 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS
NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.
If anyone could help me with an answer I would be very grateful.