Error building torch from source

Hello,

I am trying to build pytorch from sources and getting error:

  /usr/local/cuda/include/cub/util_allocator.cuh(694): error: namespace "cub" has no member "Debug"                                                                                                                     

To reproduce run

git clone https://github.com/osai-ai/dokai
cd dokai 
git checkout feat/OLIB-101/dokai-update
make build

All of the versions could be find in docker/Dockerfile.base and docker/Dockerfile.pytorch files.
I have not found any constraints about CUDA, CUDNN or FFMPEG version to build pytorch as well as didn’t find how to do that with pip instead of conda.

I guess you might be trying to build an older PyTorch release with a new CUDA version, so would need to update to the current master.

I am building my image using nvidia/cuda:11.5.1-cudnn8-devel-ubuntu20.04, so my CUDA version is 11.5.1. My pytorch version is v1.10.2.

I’ve also tried to build from master branch. No luck.

I will build from master again and will send you the error I am getting.

Meanwhile would be great to know compatible versions of CUDA, CUDNN, Pytorch, ffmpeg maybe for building from source. Also would be nice to look at some scripts how you (I mean pytorch developers :slight_smile:) build it from source, because all that I’ve found is just python3 setup.py but I will believe it’s too simple to be true.

@ptrblck, seems like you were right! Torch builds successfully from master!

But torchvision not, I am getting an error:

Stacktrace
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/autocast/deform_conv2d_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/autocast/nms_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/autocast/ps_roi_align_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/autocast/ps_roi_pool_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/autocast/roi_align_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/autocast/roi_pool_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/autograd/deform_conv2d_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/autograd/ps_roi_align_kernel.o: No such file or directory                                                
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/autograd/ps_roi_pool_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/autograd/roi_align_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/autograd/roi_pool_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/cpu/deform_conv2d_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/cpu/nms_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/cpu/ps_roi_align_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/cpu/ps_roi_pool_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/cpu/roi_align_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/cpu/roi_pool_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/cuda/deform_conv2d_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/cuda/interpolate_aa_kernels.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/cuda/nms_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/cuda/ps_roi_align_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/cuda/ps_roi_pool_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/cuda/roi_align_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/cuda/roi_pool_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/deform_conv2d.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/interpolate_aa.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/nms.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/ps_roi_align.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/ps_roi_pool.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/quantized/cpu/qnms_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/quantized/cpu/qroi_align_kernel.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/roi_align.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/ops/roi_pool.o: No such file or directory
  x86_64-linux-gnu-g++: error: /workdir/vision/build/temp.linux-x86_64-3.8/workdir/vision/torchvision/csrc/vision.o: No such file or directory
  error: command '/usr/bin/x86_64-linux-gnu-g++' failed with exit code 1
  error: subprocess-exited-with-error

Do you know maybe how to fix this?

message I guess a previously failed build might create the issue. Try to clean the build via python setup.pt clean and rebuild torchvision again. Based on the error

I am building every time from scratch, so it could not be an issue of cache.
Here is my Dockerfile:

Summary
FROM dokai:base

ENV TORCH_CUDA_ARCH_LIST 5.2;6.0;6.1;7.0;7.5;8.0;8.6

# Build MAGMA
COPY docker/magma/make.inc make.inc
RUN MAGMA_VERSION=2.6.1 &&\
    ln -s /usr/local/cuda/lib64/libcudart.so /usr/lib/libcudart.so &&\
    wget http://icl.utk.edu/projectsfiles/magma/downloads/magma-${MAGMA_VERSION}.tar.gz &&\
    tar -xzf magma-${MAGMA_VERSION}.tar.gz &&\
    cp make.inc magma-${MAGMA_VERSION} &&\
    cd magma-${MAGMA_VERSION} &&\
    make -j$(nproc) && make install &&\
    cd .. && rm -rf magma-${MAGMA_VERSION} magma-${MAGMA_VERSION}.tar.gz make.inc

# Install PyTorch
RUN git clone --depth 1 -b master --single-branch https://github.com/pytorch/pytorch.git &&\
    cd pytorch &&\
    git submodule sync && git submodule update --init --recursive && \
    TORCH_NVCC_FLAGS="-Xfatbin -compress-all" && \
    TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST}" && \
    USE_CUDA=ON && \
    pip install -v . && \
    cd .. && rm -rf pytorch

# Hack to fix small bug
RUN sed -i "s/, '-v'/, '--version'/" \
    "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py"

# Install torchvision
RUN git clone --depth 1 -b main https://github.com/pytorch/vision.git &&\
    cd vision && \
    FORCE_CUDA=1 DEBUG=1 pip install -v . &&\
    cd .. && rm -rf vision