Build from Source Failed

I get the following error when I try to build pytorch from source. Any help is appreciated. The detailed log file is provided in the link.

Traceback (most recent call last):
File “setup.py”, line 737, in
build_deps()
File “setup.py”, line 316, in build_deps
cmake=cmake)
File “/home/z840/pytorch/tools/build_pytorch_libs.py”, line 62, in build_caffe2
cmake.build(my_env)
File “/home/z840/pytorch/tools/setup_helpers/cmake.py”, line 339, in build
self.run(build_args, my_env)
File “/home/z840/pytorch/tools/setup_helpers/cmake.py”, line 141, in run
check_call(command, cwd=self.build_dir, env=env)
File “/home/z840/miniconda3/envs/p37_build/lib/python3.7/subprocess.py”, line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘cmake’, ‘–build’, ‘.’, ‘–target’, ‘install’, ‘–config’, ‘Release’, ‘–’, ‘-j’, ‘40’]’ returned non-zero exit status 1.

------------- MY SYSTEM AND CONFIGURATIONS -------
uname -a
Linux z840-HP-Z840-Workstation 5.3.0-40-generic #32~18.04.1-Ubuntu SMP Mon Feb 3 14:05:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
nvidia-smi
Thu Mar 5 20:42:56 2020
±----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K4200 Off | 00000000:04:00.0 On | N/A |
| 37% 73C P0 44W / 110W | 321MiB / 4034MiB | 6% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1480 G /usr/lib/xorg/Xorg 141MiB |
| 0 1732 G /usr/bin/gnome-shell 104MiB |
| 0 2229 C /usr/NX/bin/nxnode.bin 61MiB |
±----------------------------------------------------------------------------+

The commands I used are:
conda create --name p37_build python=3.7
conda activate p37_build
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
git clone --recursive https://github.com/pytorch/pytorch
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/…/"}
export USE_CUDA=1 USE_CUDNN=1 USE_MKLDNN=1
cd pytorch/
python setup.py install 2>&1 > build.log

There seem to be some errors fiding/compiling cuda files.

Note that a recent change means that you have to run python setup.py clean if you ever installed in this folder before (we are fixing that). Do you see the same thing after cleaning?

I deleted everything(environments and downloaded pytorch files). Then re-downloaded pytorch and created a new environment with another name. Still getting error messages. The new build log is on the link.
Before starting building, I ran some commands and the results are as follows:
./bin/x86_64/linux/release/bandwidthTest
[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: Quadro K4200
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 6.1

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 6.6

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 136.5

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

./bin/x86_64/linux/release/deviceQuery
CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “Quadro K4200”
CUDA Driver Version / Runtime Version 10.1 / 10.1
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 4034 MBytes (4230086656 bytes)
( 7) Multiprocessors, (192) CUDA Cores/MP: 1344 CUDA Cores
GPU Max Clock rate: 784 MHz (0.78 GHz)
Memory Clock rate: 2700 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 4 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS

nvcc -V
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

Hi,

I’m afraid the main reason is that the latest officially supported compute capability is 3.5 while your device is 3.0 :confused:
So I guess we have code in these files that only work for compute capability 3.5+
@smth who should I ping to double check and add a nicer error message here?