I’m building torch and torchvision from source since my system is fixed to CUDA 10.0 unfortunately and I need torch 1.6+
I am able to build and use torch fine , however I get an error when I add the torchvision build. A condensed version of my DockerFile is below:
FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04
ARG PYTHON_VERSION=3.6
RUN curl -o ~/miniconda.sh -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
chmod +x ~/miniconda.sh && \
~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh
# Use conda to install python and some packages
RUN /opt/conda/bin/conda install -y python=$PYTHON_VERSION
# Add conda python to the path
ENV PATH=/opt/conda/bin:$PATH
# Tools to build from source
RUN conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses
# CUDA10
RUN conda install -c pytorch magma-cuda100
# Compile torch (WORKS)
RUN cd / && \
git clone --recursive https://github.com/pytorch/pytorch && \
cd pytorch && \
git submodule sync && \
git submodule update --init --recursive && \
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"} && \
python setup.py install
# Compile torchvision (DOES NOT WORK)
RUN cd / && \
git clone --recursive https://github.com/pytorch/vision.git && \
cd vision && \
python setup.py install
And the error I get doesn’t seem super useful:
Edit - ah I didn’t spot the real error …
Found no NVIDIA driver on your system
This is weird because torch installs fine (with cuda) but torchvision doesn’t?
I googled and saw that some install torchvision like so (whereas I didn’t specify TORCH_CUDA_ARCH_LIST):
ARG torchvision_tag='v0.5.0'
ARG torchvision_cuda='0'
RUN git clone --recursive https://github.com/pytorch/vision \
&& cd vision \
&& git checkout $torchvision_tag \
&& git submodule sync \
&& git submodule update --init --recursive
RUN cd vision \
&& . /opt/conda/bin/activate \
&& export TORCH_CUDA_ARCH_LIST="3.7;6.1;7.5" \
&& export FORCE_CUDA=$torchvision_cuda \
&& python setup.py install \
&& python setup.py bdist_wheel
RUN find vision -name '*.whl' \
&& cp vision/dist/*.whl /packages
So I can give that a go; I’m not sure about FORCE_CUDA
but will try to set it to True first. I will also use the ARCH_LIST from the pytorch DockerFile which is TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0+PTX"
and thus a bit different.
Is it just the case that you list the CUDA capabilities you want and so if I wish to run on V100 then 7.0 and if also on 2080 then 7.5? So ideally it would be “7.0 7.5”?
Edit 2: Seems none of that helped.
Step 41/56 : RUN cd vision && . /opt/conda/bin/activate && export TORCH_CUDA_ARCH_LIST="3.7;6.1;7.0;7.5" && export FORCE_CUDA=$torchvision_cuda && python setup.py install
---> Running in ff2dd49d614a
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Building wheel torchvision-0.7.0a0+78ed10c
And then the full error:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "setup.py", line 255, in <module>
'clean': clean,
File "/opt/conda/lib/python3.6/site-packages/setuptools/__init__.py", line 163, in setup
return distutils.core.setup(**attrs)
File "/opt/conda/lib/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
File "/opt/conda/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/opt/conda/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.6/site-packages/setuptools/command/install.py", line 67, in run
self.do_egg_install()
File "/opt/conda/lib/python3.6/site-packages/setuptools/command/install.py", line 109, in do_egg_install
self.run_command('bdist_egg')
File "/opt/conda/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.6/site-packages/setuptools/command/bdist_egg.py", line 175, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/opt/conda/lib/python3.6/site-packages/setuptools/command/bdist_egg.py", line 161, in call_command
self.run_command(cmdname)
File "/opt/conda/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.6/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/opt/conda/lib/python3.6/distutils/command/install_lib.py", line 107, in build
self.run_command('build_ext')
File "/opt/conda/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 87, in run
_build_ext.run(self)
File "/opt/conda/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 649, in build_extensions
build_ext.build_extensions(self)
File "/opt/conda/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
self._build_extensions_serial()
File "/opt/conda/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
self.build_extension(ext)
File "/opt/conda/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 208, in build_extension
_build_ext.build_extension(self, ext)
File "/opt/conda/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 478, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1233, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1529, in _run_ninja_build
raise RuntimeError(message)
RuntimeError: Error compiling objects for extension