PyTorch 1.2.0 with compute capability 3.5


I am using PyTorch 1.2.0 self-compiled with CUDA compute capability 5.2 with C++ and everything works as expected.

I read somewhere that everything down to compute capability 3.5 is supported. Hence, as we aim to support as many graphic cards as possible, i tried to compile PyTorch with compute capability 3.5. However, I get an error when using this version:

.THCudaCheck FAIL file=D:/tools/pytorch-v1.2.0/aten/src\THC/generic/ line=16 error=209 : no kernel image is available for execution on the device
exception message: cuda runtime error (209) : no kernel image is available for execution on the device at D:/tools/pytorch-v1.2.0/aten/src\THC/generic/
The above operation failed in interpreter, with the following stack trace:
at code/
_135 = getattr(_131, ā€œ1ā€)
_136 = _135.weight
_137 = _135.bias
_138 = getattr(self.decoder0, ā€œ0ā€)
_139 = _138.weight
140 = _138.bias
_141 = getattr(self.logit, ā€œ0ā€)
_142 = _141.weight
_143 = _141.bias
input0 = torch._convolution(input, _1, None, [2, 2], [3, 3], [1, 1], False, [0, 0], 1, True, False, True)
~~~~~~~~~~~~~~~~~~ <ā€” HERE
input1 = torch.batch_norm(input0, weight, bias, running_mean, running_var, False, 0.10000000000000001, 1.0000000000000001e-05, True)
input2 = torch.relu
input3 = torch.max_pool2d(input2, [3, 3], [2, 2], [1, 1], [1, 1], False)
input4 = torch._convolution(input3, 5, None, [1, 1], [1, 1], [1, 1], False, [0, 0], 1, True, False, True)
input5 = torch.batch_norm(input4, weight0, bias0, running_mean0, running_var0, False, 0.10000000000000001, 1.0000000000000001e-05, True)
input6 = torch.relu
input7 = torch._convolution(input6, 7, None, [1, 1], [1, 1], [1, 1], False, [0, 0], 1, True, False, True)
out = torch.batch_norm(input7, weight1, bias1, running_mean1, running_var1, False, 0.10000000000000001, 1.0000000000000001e-05, True)
input8 = torch.add
(out, input3, alpha=1)Compiled from code /opt/conda/lib/python3.6/site-packages/torch/nn/modules/ forward
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/ _slow_forward
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/ call
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/ forward
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/ _slow_forward
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/ call
ā€¦/dl/models/ forward
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/ _slow_forward
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/ call
/opt/conda/lib/python3.6/site-packages/torch/jit/ trace_module
/opt/conda/lib/python3.6/site-packages/torch/jit/ trace
/opt/conda/lib/python3.6/site-packages/IPython/core/ run_code
/opt/conda/lib/python3.6/site-packages/IPython/core/ run_ast_nodes
/opt/conda/lib/python3.6/site-packages/IPython/core/ run_cell_async
/opt/conda/lib/python3.6/site-packages/IPython/core/ _pseudo_sync_runner
/opt/conda/lib/python3.6/site-packages/IPython/core/ _run_cell
/opt/conda/lib/python3.6/site-packages/IPython/core/ run_cell
/opt/conda/lib/python3.6/site-packages/ipykernel/ run_cell
/opt/conda/lib/python3.6/site-packages/ipykernel/ do_execute
/opt/conda/lib/python3.6/site-packages/tornado/ wrapper
/opt/conda/lib/python3.6/site-packages/ipykernel/ execute_request
/opt/conda/lib/python3.6/site-packages/tornado/ wrapper
/opt/conda/lib/python3.6/site-packages/ipykernel/ dispatch_shell
/opt/conda/lib/python3.6/site-packages/tornado/ wrapper
/opt/conda/lib/python3.6/site-packages/ipykernel/ process_one
/opt/conda/lib/python3.6/site-packages/tornado/ run
/opt/conda/lib/python3.6/site-packages/tornado/ inner
/opt/conda/lib/python3.6/site-packages/tornado/ _run_callback
/opt/conda/lib/python3.6/asyncio/ _run
/opt/conda/lib/python3.6/asyncio/ _run_once
/opt/conda/lib/python3.6/asyncio/ run_forever
/opt/conda/lib/python3.6/site-packages/tornado/platform/ start
/opt/conda/lib/python3.6/site-packages/ipykernel/ start
/opt/conda/lib/python3.6/site-packages/traitlets/config/ launch_instance
/opt/conda/lib/python3.6/ _run_code
/opt/conda/lib/python3.6/ _run_module_as_main

So I assume compute capability 3.5 is also not fully supported anymore?
Down to which compute capability PyTorch 1.2.0 should work correctly?

My system:
Windows 10
Visual Studio 2019 - CUDA 10.1
Python 3.7
Self-Compiled PyTorch 1.2.0



I tried to build with compute capability 5.0 now, and this build is working. So I assume PyTorch requires 5.0 as minimum compute capability. Is this correct?


The comments about pytorch working all the way down to cc3.5 are quite old. Iā€™m afraid this is not true anymore.

Actually 3.5 is in the CUDA_ARCH_LIST, I wonder why that is not supported.

1 Like

Interestingly, when I use the prebuild windows version built with CUDA/CuDNN, it works on a graphics card with cc 3.7 (Tesla K80). However, when I build pytorch myself with CUDA but without CuDNN, it does not work on this graphics card.

Is it possible that some instructions are only implemented for a higher cc in CUDA, but if CuDNN is used these instructions are implemented with CuDNN and hence it works?