Error when building PyTorch 1.4.0 from source

Saguaro · February 27, 2023, 10:48am

When building PyTorch 1.4.0 on Ubuntu 20.02 with CUDA 11.3, cuDNN 8.8.0.121 I get

...
[1682/3492] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/sparse/cuda/torch_generated_SparseCUDABlas.cu.o
FAILED: caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/sparse/cuda/torch_generated_SparseCUDABlas.cu.o /home/b1-gpu/apps/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/sparse/cuda/torch_generated_SparseCUDABlas.cu.o 
cd /home/b1-gpu/apps/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/sparse/cuda && /home/b1-gpu/miniconda3/envs/CenterTrack/bin/cmake -E make_directory /home/b1-gpu/apps/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/sparse/cuda/. && /home/b1-gpu/miniconda3/envs/CenterTrack/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Release -D generated_file:STRING=/home/b1-gpu/apps/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/sparse/cuda/./torch_generated_SparseCUDABlas.cu.o -D generated_cubin_file:STRING=/home/b1-gpu/apps/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/sparse/cuda/./torch_generated_SparseCUDABlas.cu.o.cubin.txt -P /home/b1-gpu/apps/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/sparse/cuda/torch_generated_SparseCUDABlas.cu.o.Release.cmake
/home/b1-gpu/apps/pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu(118): error: identifier "cusparseScsrmm2" is undefined

/home/b1-gpu/apps/pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu(141): error: identifier "cusparseDcsrmm2" is undefined

2 errors detected in the compilation of "/home/b1-gpu/apps/pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu".
CMake Error at torch_generated_SparseCUDABlas.cu.o.Release.cmake:281 (message):
  Error generating file
  /home/b1-gpu/apps/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/native/sparse/cuda/./torch_generated_SparseCUDABlas.cu.o
...
ninja: build stopped: subcommand failed.
Building wheel torch-1.4.0a0+7f73f1d
-- Building version 1.4.0a0+7f73f1d
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/b1-gpu/apps/pytorch/torch -DCMAKE_PREFIX_PATH=/home/b1-gpu/miniconda3/envs/CenterTrack/lib/python3.9/site-packages -DNUMPY_INCLUDE_DIR=/home/b1-gpu/miniconda3/envs/CenterTrack/lib/python3.9/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/home/b1-gpu/miniconda3/envs/CenterTrack/bin/python -DPYTHON_INCLUDE_DIR=/home/b1-gpu/miniconda3/envs/CenterTrack/include/python3.9 -DPYTHON_LIBRARY=/home/b1-gpu/miniconda3/envs/CenterTrack/lib/libpython3.9.a -DTORCH_BUILD_VERSION=1.4.0a0+7f73f1d -DUSE_NUMPY=True /home/b1-gpu/apps/pytorch
cmake --build . --target install --config Release -- -j 24

I cannot put the whole trace as it is too long.
I also get many warnings that are similar to

CMake Warning at modules/observers/CMakeLists.txt:12 (add_library):
  Cannot generate a safe runtime search path for target caffe2_observers
  because files in some directories may conflict with libraries in implicit
  directories:

    runtime library [libnvToolsExt.so.1] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda-11.3/lib64

  Some of these libraries may not be found correctly.

and

[1545/3492] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCStorage.cu.o
In file included from /usr/local/cuda-11.3/include/thrust/detail/config/config.h:27,
                 from /usr/local/cuda-11.3/include/thrust/detail/config.h:23,
                 from /usr/local/cuda-11.3/include/thrust/device_ptr.h:24,
                 from /home/b1-gpu/apps/pytorch/aten/src/THC/THCStorage.cu:4:
/usr/local/cuda-11.3/include/thrust/detail/config/cpp_dialect.h:118:13: warning: Thrust requires C++14. Please pass -std=c++14 to your compiler. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
  118 |   THRUST_COMPILER_DEPRECATION(C++14, pass -std=c++14 to your compiler);

What is the root-cause of the problem?

ptrblck · February 27, 2023, 10:53pm

PyTorch 1.40 was released in January 2020 while CUDA 11.3 was released in April 2021, which might explain the error.
Use a newer PyTorch release, which would support CUDA 11.3, or downgrade your CUDA toolkit if you really want to build this old PyTorch version.