Failure when build docker image

Hi all,
I am trying to build a docker image following the document. However, when I run make -f docker.Makefile, it failed with below error message

#25 35.44 cmake -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/pytorch/torch -DCMAKE_PREFIX_PATH=/opt/conda/lib/python3.7/site-packages;/opt/conda/bin/../ -DNUMPY_INCLUDE_DIR=/opt/conda/lib/python3.7/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/opt/conda/bin/python -DPYTHON_INCLUDE_DIR=/opt/conda/include/python3.7m -DPYTHON_LIBRARY=/opt/conda/lib/libpython3.7m.so.1.0 -DTORCH_BUILD_VERSION=1.13.0a0+git9be97ea -DUSE_NUMPY=True /opt/pytorch
------
failed to solve with frontend dockerfile.v0: failed to solve with frontend gateway.v0: rpc error: code = Unknown desc = failed to build LLB: executor failed running [/bin/sh -c TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0+PTX 8.0" TORCH_NVCC_FLAGS="-Xfatbin -compress-all"     CMAKE_PREFIX_PATH="$(dirname $(which conda))/../"     python setup.py install]: runc did not terminate sucessfully
docker.Makefile:48: recipe for target 'devel-image' failed

My docker version is Docker version 19.03.14, build 5eb3275d40, OS with Ubuntu 18.04.1 and CUDA version with 11.2. Please let me know if there is any other information useful I can provide.

Any help or idea would be appreciated!

Are you seeing any errors in the build log file?
The current error is only explaining that the docker container build failed due to a previous error.

Thank you for replying. I noticed there are some .bzl file does not exist errors, but I am using the master branch of pytorch, so files should be complete. Part of the error log is shown below due to character limit.

#25 30.98 CMake Error: File /opt/pytorch/build_variables.bzl does not exist.
#25 30.98 CMake Error at cmake/Codegen.cmake:351 (configure_file):
#25 30.98   configure_file Problem configuring file
#25 30.98 Call Stack (most recent call first):
#25 30.98   aten/src/ATen/CMakeLists.txt:163 (append_filelist)
#25 30.98 
#25 30.98 
#25 30.98 CMake Error: File /opt/pytorch/build_variables.bzl does not exist.
#25 30.98 CMake Error at cmake/Codegen.cmake:351 (configure_file):
#25 30.98   configure_file Problem configuring file
#25 30.98 Call Stack (most recent call first):
#25 30.98   aten/src/ATen/CMakeLists.txt:164 (append_filelist)
#25 30.98 
#25 30.98 
#25 30.98 CMake Error: File /opt/pytorch/build_variables.bzl does not exist.
#25 30.98 CMake Error at cmake/Codegen.cmake:351 (configure_file):
#25 30.98   configure_file Problem configuring file
#25 30.98 Call Stack (most recent call first):
#25 30.98   aten/src/ATen/CMakeLists.txt:245 (append_filelist)
#25 30.98 
#25 30.98 
#25 30.98 CMake Error: File /opt/pytorch/build_variables.bzl does not exist.
#25 30.98 CMake Error at cmake/Codegen.cmake:351 (configure_file):
#25 30.98   configure_file Problem configuring file
#25 30.98 Call Stack (most recent call first):
#25 30.98   aten/src/ATen/CMakeLists.txt:246 (append_filelist)
#25 30.98 
#25 30.98 
#25 30.98 CMake Error: File /opt/pytorch/build_variables.bzl does not exist.
#25 30.98 CMake Error at cmake/Codegen.cmake:351 (configure_file):
#25 30.98   configure_file Problem configuring file
#25 30.98 Call Stack (most recent call first):
#25 30.98   aten/src/ATen/CMakeLists.txt:247 (append_filelist)

I’m not familiar enough with bazel as I’m not using it to build PyTorch from source, but a quick search yields this PR which has moved the file to the root folder so check if this file if indeed there.

1 Like

Thanks for pointing out. I checked the root folder and the build_variables.bzl is indeed in my root folder. My file structure is below

.
|-- BUCK.oss
|-- BUILD.bazel
|-- CITATION
|-- CMakeLists.txt
|-- CODEOWNERS
|-- CODE_OF_CONDUCT.md
|-- CONTRIBUTING.md
|-- Dockerfile
|-- GLOSSARY.md
|-- LICENSE
|-- MANIFEST.in
|-- Makefile
|-- NOTICE
|-- README.md
|-- RELEASE.md
|-- SECURITY.md
|-- WORKSPACE
|-- android
|-- aten
|-- aten.bzl
|-- benchmarks
|-- binaries
|-- buckbuild.bzl
|-- build
|-- build.bzl
|-- build_variables.bzl
|-- c10
|-- c2_defs.bzl
|-- c2_test_defs.bzl
|-- caffe2
|-- cmake
|-- defs.bzl
|-- defs_gpu.bzl
|-- defs_hip.bzl
|-- docker.Makefile
|-- docs
|-- img.txt
|-- ios
|-- key.asc
|-- make_log.txt
|-- modules
|-- mypy-strict.ini
|-- mypy.ini
|-- mypy_plugins
|-- pt_defs.oss.bzl
|-- pt_template_srcs.bzl
|-- pyproject.toml
|-- pytest.ini
|-- requirements-flake8.txt
|-- requirements.txt
|-- scripts
|-- setup.py
|-- test
|-- third_party
|-- tools
|-- torch
|-- torchgen
|-- tree.txt
|-- ubsan.supp
|-- ufunc_defs.bzl
`-- version.txt
18 directories, 43 files

I noticed that Dockerfile is trying to copy this directory to the image through

FROM dev-base as submodule-update
WORKDIR /opt/pytorch
COPY . .
RUN ls /opt/pytorch
RUN git submodule update --init --recursive --jobs 0

However, when I use RUN ls /opt/pytorch to print the pasted contents, there is no build_variables.bzl in the image. The weird thing is that some files do exist, such like defs.bzl, defs_gpu.bzl, etc,.