PyTorch 1.2.0 with compute capability 3.5

Hi,

I am using PyTorch 1.2.0 self-compiled with CUDA compute capability 5.2 with C++ and everything works as expected.

I read somewhere that everything down to compute capability 3.5 is supported. Hence, as we aim to support as many graphic cards as possible, i tried to compile PyTorch with compute capability 3.5. However, I get an error when using this version:

.THCudaCheck FAIL file=D:/tools/pytorch-v1.2.0/aten/src\THC/generic/THCTensorMath.cu line=16 error=209 : no kernel image is available for execution on the device
exception message: cuda runtime error (209) : no kernel image is available for execution on the device at D:/tools/pytorch-v1.2.0/aten/src\THC/generic/THCTensorMath.cu:16
The above operation failed in interpreter, with the following stack trace:
at code/model-input_rgbip-output_14classes_best_train_2019_08_03_cpu-eval-mode-export_latest_pytorch.py:292:12
_135 = getattr(_131, ā€œ1ā€)
_136 = _135.weight
_137 = _135.bias
_138 = getattr(self.decoder0, ā€œ0ā€)
_139 = _138.weight
_140 = _138.bias
_141 = getattr(self.logit, ā€œ0ā€)
_142 = _141.weight
_143 = _141.bias
input0 = torch._convolution(input, 1, None, [2, 2], [3, 3], [1, 1], False, [0, 0], 1, True, False, True)
~~~~~~~~~~~~~~~~~~ <ā€” HERE
input1 = torch.batch_norm(input0, weight, bias, running_mean, running_var, False, 0.10000000000000001, 1.0000000000000001e-05, True)
input2 = torch.relu
(input1)
input3 = torch.max_pool2d(input2, [3, 3], [2, 2], [1, 1], [1, 1], False)
input4 = torch._convolution(input3, 5, None, [1, 1], [1, 1], [1, 1], False, [0, 0], 1, True, False, True)
input5 = torch.batch_norm(input4, weight0, bias0, running_mean0, running_var0, False, 0.10000000000000001, 1.0000000000000001e-05, True)
input6 = torch.relu
(input5)
input7 = torch._convolution(input6, 7, None, [1, 1], [1, 1], [1, 1], False, [0, 0], 1, True, False, True)
out = torch.batch_norm(input7, weight1, bias1, running_mean1, running_var1, False, 0.10000000000000001, 1.0000000000000001e-05, True)
input8 = torch.add
(out, input3, alpha=1)Compiled from code /opt/conda/lib/python3.6/site-packages/torch/nn/modules/conv.py(340): forward
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py(523): _slow_forward
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py(537): call
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py(92): forward
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py(523): _slow_forward
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py(537): call
ā€¦/dl/models/unet.py(153): forward
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py(523): _slow_forward
/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py(537): call
/opt/conda/lib/python3.6/site-packages/torch/jit/init.py(883): trace_module
/opt/conda/lib/python3.6/site-packages/torch/jit/init.py(751): trace
(5):
/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py(3296): run_code
/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py(3214): run_ast_nodes
/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py(3049): run_cell_async
/opt/conda/lib/python3.6/site-packages/IPython/core/async_helpers.py(67): _pseudo_sync_runner
/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py(2874): _run_cell
/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py(2848): run_cell
/opt/conda/lib/python3.6/site-packages/ipykernel/zmqshell.py(536): run_cell
/opt/conda/lib/python3.6/site-packages/ipykernel/ipkernel.py(294): do_execute
/opt/conda/lib/python3.6/site-packages/tornado/gen.py(209): wrapper
/opt/conda/lib/python3.6/site-packages/ipykernel/kernelbase.py(534): execute_request
/opt/conda/lib/python3.6/site-packages/tornado/gen.py(209): wrapper
/opt/conda/lib/python3.6/site-packages/ipykernel/kernelbase.py(267): dispatch_shell
/opt/conda/lib/python3.6/site-packages/tornado/gen.py(209): wrapper
/opt/conda/lib/python3.6/site-packages/ipykernel/kernelbase.py(357): process_one
/opt/conda/lib/python3.6/site-packages/tornado/gen.py(742): run
/opt/conda/lib/python3.6/site-packages/tornado/gen.py(781): inner
/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py(743): _run_callback
/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py(690):
/opt/conda/lib/python3.6/asyncio/events.py(145): _run
/opt/conda/lib/python3.6/asyncio/base_events.py(1451): _run_once
/opt/conda/lib/python3.6/asyncio/base_events.py(438): run_forever
/opt/conda/lib/python3.6/site-packages/tornado/platform/asyncio.py(148): start
/opt/conda/lib/python3.6/site-packages/ipykernel/kernelapp.py(505): start
/opt/conda/lib/python3.6/site-packages/traitlets/config/application.py(658): launch_instance
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py(16):
/opt/conda/lib/python3.6/runpy.py(85): _run_code
/opt/conda/lib/python3.6/runpy.py(193): _run_module_as_main

So I assume compute capability 3.5 is also not fully supported anymore?
Down to which compute capability PyTorch 1.2.0 should work correctly?

My system:
Windows 10
Visual Studio 2019 - CUDA 10.1
Python 3.7
Self-Compiled PyTorch 1.2.0

Thanks!

Best,
Thomas

I tried to build with compute capability 5.0 now, and this build is working. So I assume PyTorch requires 5.0 as minimum compute capability. Is this correct?

Hi,

The comments about pytorch working all the way down to cc3.5 are quite old. Iā€™m afraid this is not true anymore.




Actually 3.5 is in the CUDA_ARCH_LIST, I wonder why that is not supported.

1 Like

Interestingly, when I use the prebuild windows version built with CUDA/CuDNN, it works on a graphics card with cc 3.7 (Tesla K80). However, when I build pytorch myself with CUDA but without CuDNN, it does not work on this graphics card.

Is it possible that some instructions are only implemented for a higher cc in CUDA, but if CuDNN is used these instructions are implemented with CuDNN and hence it works?