Hi all,
I am trying to build a docker image following the document. However, when I run make -f docker.Makefile, it failed with below error message
#25 35.44 cmake -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/pytorch/torch -DCMAKE_PREFIX_PATH=/opt/conda/lib/python3.7/site-packages;/opt/conda/bin/../ -DNUMPY_INCLUDE_DIR=/opt/conda/lib/python3.7/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/opt/conda/bin/python -DPYTHON_INCLUDE_DIR=/opt/conda/include/python3.7m -DPYTHON_LIBRARY=/opt/conda/lib/libpython3.7m.so.1.0 -DTORCH_BUILD_VERSION=1.13.0a0+git9be97ea -DUSE_NUMPY=True /opt/pytorch
------
failed to solve with frontend dockerfile.v0: failed to solve with frontend gateway.v0: rpc error: code = Unknown desc = failed to build LLB: executor failed running [/bin/sh -c TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0+PTX 8.0" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" python setup.py install]: runc did not terminate sucessfully
docker.Makefile:48: recipe for target 'devel-image' failed
My docker version is Docker version 19.03.14, build 5eb3275d40, OS with Ubuntu 18.04.1 and CUDA version with 11.2. Please let me know if there is any other information useful I can provide.
Thank you for replying. I noticed there are some .bzl file does not exist errors, but I am using the master branch of pytorch, so files should be complete. Part of the error log is shown below due to character limit.
#25 30.98 CMake Error: File /opt/pytorch/build_variables.bzl does not exist.
#25 30.98 CMake Error at cmake/Codegen.cmake:351 (configure_file):
#25 30.98 configure_file Problem configuring file
#25 30.98 Call Stack (most recent call first):
#25 30.98 aten/src/ATen/CMakeLists.txt:163 (append_filelist)
#25 30.98
#25 30.98
#25 30.98 CMake Error: File /opt/pytorch/build_variables.bzl does not exist.
#25 30.98 CMake Error at cmake/Codegen.cmake:351 (configure_file):
#25 30.98 configure_file Problem configuring file
#25 30.98 Call Stack (most recent call first):
#25 30.98 aten/src/ATen/CMakeLists.txt:164 (append_filelist)
#25 30.98
#25 30.98
#25 30.98 CMake Error: File /opt/pytorch/build_variables.bzl does not exist.
#25 30.98 CMake Error at cmake/Codegen.cmake:351 (configure_file):
#25 30.98 configure_file Problem configuring file
#25 30.98 Call Stack (most recent call first):
#25 30.98 aten/src/ATen/CMakeLists.txt:245 (append_filelist)
#25 30.98
#25 30.98
#25 30.98 CMake Error: File /opt/pytorch/build_variables.bzl does not exist.
#25 30.98 CMake Error at cmake/Codegen.cmake:351 (configure_file):
#25 30.98 configure_file Problem configuring file
#25 30.98 Call Stack (most recent call first):
#25 30.98 aten/src/ATen/CMakeLists.txt:246 (append_filelist)
#25 30.98
#25 30.98
#25 30.98 CMake Error: File /opt/pytorch/build_variables.bzl does not exist.
#25 30.98 CMake Error at cmake/Codegen.cmake:351 (configure_file):
#25 30.98 configure_file Problem configuring file
#25 30.98 Call Stack (most recent call first):
#25 30.98 aten/src/ATen/CMakeLists.txt:247 (append_filelist)
I’m not familiar enough with bazel as I’m not using it to build PyTorch from source, but a quick search yields this PR which has moved the file to the root folder so check if this file if indeed there.
I noticed that Dockerfile is trying to copy this directory to the image through
FROM dev-base as submodule-update
WORKDIR /opt/pytorch
COPY . .
RUN ls /opt/pytorch
RUN git submodule update --init --recursive --jobs 0
However, when I use RUN ls /opt/pytorch to print the pasted contents, there is no build_variables.bzl in the image. The weird thing is that some files do exist, such like defs.bzl, defs_gpu.bzl, etc,.