Failed to build PyTorch 1.6.0

  1. My ENV:
➜  pytorch lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.1 LTS
Release:	20.04
Codename:	focal
➜  pytorch uname -r
5.4.0-42-generic
➜  pytorch gcc --version
gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

➜  pytorch clang --version
clang version 10.0.0-4ubuntu1 
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
➜  pytorch python --version
Python 3.8.2
  1. Command I’m using to build PyTorch:
TORCH_CUDA_ARCH_LIST="6.1" NO_TEST=1 USE_MKLDNN=0 FULL_CAFFE2=1 python setup.py build
  1. Before the above command, I ran cmake with USE_SYSTEM_SLEEF enabled ON.

  2. Error message obtained:

/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/crti.o: in function `_init':
(.init+0xb): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/AffineGridGenerator.cpp.o: in function `at::native::cudnn_affine_grid_generator_forward(at::Tensor const&, long, long, long, long)':
AffineGridGenerator.cpp:(.text+0xfc): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `c10::UndefinedTensorImpl::_singleton' defined in .bss section in ../lib/libc10.so
AffineGridGenerator.cpp:(.text+0x529): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `c10::UndefinedTensorImpl::_singleton' defined in .bss section in ../lib/libc10.so
AffineGridGenerator.cpp:(.text+0x5be): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `c10::UndefinedTensorImpl::_singleton' defined in .bss section in ../lib/libc10.so
AffineGridGenerator.cpp:(.text+0x975): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `c10::UndefinedTensorImpl::_singleton' defined in .bss section in ../lib/libc10.so
AffineGridGenerator.cpp:(.text+0xbae): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `c10::UndefinedTensorImpl::_singleton' defined in .bss section in ../lib/libc10.so
CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cudnn/AffineGridGenerator.cpp.o: in function `at::native::cudnn_affine_grid_generator_backward(at::Tensor const&, long, long, long, long)':
AffineGridGenerator.cpp:(.text+0x105c): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `c10::UndefinedTensorImpl::_singleton' defined in .bss section in ../lib/libc10.so
AffineGridGenerator.cpp:(.text+0x1489): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `c10::UndefinedTensorImpl::_singleton' defined in .bss section in ../lib/libc10.so
AffineGridGenerator.cpp:(.text+0x1535): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `c10::UndefinedTensorImpl::_singleton' defined in .bss section in ../lib/libc10.so
AffineGridGenerator.cpp:(.text+0x18ec): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `c10::UndefinedTensorImpl::_singleton' defined in .bss section in ../lib/libc10.so
AffineGridGenerator.cpp:(.text+0x1b04): additional relocation overflows omitted from the output
/usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax
collect2: error: ld returned 1 exit status
make[2]: *** [caffe2/CMakeFiles/torch_cuda.dir/build.make:274250: lib/libtorch_cuda.so] Error 1
make[2]: Leaving directory '/mnt/data/jiapei/ml/dl/pytorch/pytorch/build'
make[1]: *** [CMakeFiles/Makefile2:3941: caffe2/CMakeFiles/torch_cuda.dir/all] Error 2
make[1]: Leaving directory '/mnt/data/jiapei/ml/dl/pytorch/pytorch/build'
make: *** [Makefile:144: all] Error 2
Traceback (most recent call last):
  File "setup.py", line 734, in <module>
    build_deps()
  File "setup.py", line 313, in build_deps
    build_caffe2(version=version,
  File "/mnt/data/jiapei/ml/dl/pytorch/pytorch/tools/build_pytorch_libs.py", line 62, in build_caffe2
    cmake.build(my_env)
  File "/mnt/data/jiapei/ml/dl/pytorch/pytorch/tools/setup_helpers/cmake.py", line 345, in build
    self.run(build_args, my_env)
  File "/mnt/data/jiapei/ml/dl/pytorch/pytorch/tools/setup_helpers/cmake.py", line 141, in run
    check_call(command, cwd=self.build_dir, env=env)
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '8']' returned non-zero exit status 2.

Can anybody please give me a hint? Thank you very much…
Pei

We are seeing a similar issue in this issue.
Could you try out some suggestions and see, if it would be working?