Build from Source Failed

I get the following error when I try to build pytorch from source. Any help is appreciated. The detailed log file is provided in the link.

Traceback (most recent call last):
File “”, line 737, in
File “”, line 316, in build_deps
File “/home/z840/pytorch/tools/”, line 62, in build_caffe2
File “/home/z840/pytorch/tools/setup_helpers/”, line 339, in build, my_env)
File “/home/z840/pytorch/tools/setup_helpers/”, line 141, in run
check_call(command, cwd=self.build_dir, env=env)
File “/home/z840/miniconda3/envs/p37_build/lib/python3.7/”, line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘cmake’, ‘–build’, ‘.’, ‘–target’, ‘install’, ‘–config’, ‘Release’, ‘–’, ‘-j’, ‘40’]’ returned non-zero exit status 1.

------------- MY SYSTEM AND CONFIGURATIONS -------
uname -a
Linux z840-HP-Z840-Workstation 5.3.0-40-generic #32~18.04.1-Ubuntu SMP Mon Feb 3 14:05:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Thu Mar 5 20:42:56 2020
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 Quadro K4200 Off | 00000000:04:00.0 On | N/A |
| 37% 73C P0 44W / 110W | 321MiB / 4034MiB | 6% Default |

| Processes: GPU Memory |
| GPU PID Type Process name Usage |
| 0 1480 G /usr/lib/xorg/Xorg 141MiB |
| 0 1732 G /usr/bin/gnome-shell 104MiB |
| 0 2229 C /usr/NX/bin/nxnode.bin 61MiB |

The commands I used are:
conda create --name p37_build python=3.7
conda activate p37_build
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
git clone --recursive
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/…/"}
cd pytorch/
python install 2>&1 > build.log

There seem to be some errors fiding/compiling cuda files.

Note that a recent change means that you have to run python clean if you ever installed in this folder before (we are fixing that). Do you see the same thing after cleaning?

I deleted everything(environments and downloaded pytorch files). Then re-downloaded pytorch and created a new environment with another name. Still getting error messages. The new build log is on the link.
Before starting building, I ran some commands and the results are as follows:
[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: Quadro K4200
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 6.1

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 6.6

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 136.5

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “Quadro K4200”
CUDA Driver Version / Runtime Version 10.1 / 10.1
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 4034 MBytes (4230086656 bytes)
( 7) Multiprocessors, (192) CUDA Cores/MP: 1344 CUDA Cores
GPU Max Clock rate: 784 MHz (0.78 GHz)
Memory Clock rate: 2700 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 4 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS

nvcc -V
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105


I’m afraid the main reason is that the latest officially supported compute capability is 3.5 while your device is 3.0 :confused:
So I guess we have code in these files that only work for compute capability 3.5+
@smth who should I ping to double check and add a nicer error message here?