I’m trying to build pytorch from source with ROCm for my RX580. I am using Ubuntu 18.04 on an i7-7700K with 32GB of ram.
Here are the commands I am using to install libraries in my environment:
sudo apt-get update
sudo apt-get install -y libnuma-dev libpciaccess-dev libncurses-dev
sudo apt-get install -y libopenblas-dev
# add repo
wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list
# install rocm
sudo apt update
sudo apt install -y rocm-dkms rocm-utils rocm-libs rocm-cmake miopen-hip miopengemm rocfft rocblas rocm-profiler cxlactivitylogger rocsparse hipsparse rocrand hip-thrust doxygen
# from https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Dockerfile
sudo apt-get update && DEBIAN_FRONTEND=noninteractive sudo apt-get install -y --no-install-recommends curl && \
curl -sL http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add - && \
sudo sh -c 'echo deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main > /etc/apt/sources.list.d/rocm.list' && \
sudo apt-get update && DEBIAN_FRONTEND=noninteractive sudo apt-get install -y --no-install-recommends \
sudo \
libelf1 \
build-essential \
bzip2 \
ca-certificates \
cmake \
ssh \
apt-utils \
pkg-config \
g++-multilib \
gdb \
git \
less \
libunwind-dev \
libfftw3-dev \
libelf-dev \
libncurses5-dev \
libomp-dev \
libpthread-stubs0-dev \
make \
miopen-hip \
miopengemm \
python3-dev \
python3-future \
python3-yaml \
python3-pip \
vim \
libssl-dev \
libboost-dev \
libboost-system-dev \
libboost-filesystem-dev \
libopenblas-dev \
rpm \
wget \
net-tools \
iputils-ping \
libnuma-dev \
rocm-dev \
rocrand \
rocblas \
rocfft \
hipsparse \
hip-thrust && \
curl -sL https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add - && \
sudo sh -c 'echo deb [arch=amd64] http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main > /etc/apt/sources.list.d/llvm7.list' && \
sudo sh -c 'echo deb-src http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main >> /etc/apt/sources.list.d/llvm7.list' && \
sudo apt-get update && DEBIAN_FRONTEND=noninteractive sudo apt-get install -y --no-install-recommends \
clang-7
export HIP_PLATFORM="hcc"
### BEGIN: CODE FROM https://github.com/pytorch/pytorch/blob/75a2d8e2de4a73e16c3ea22f781673ea3e15a1f9/docker/caffe2/jenkins/common/install_rocm.sh
# HIP has a bug that drops DEBUG symbols in generated MakeFiles.
# https://github.com/ROCm-Developer-Tools/HIP/pull/588
if [[ -f /opt/rocm/hip/cmake/FindHIP.cmake ]]; then
sudo sed -i 's/set(_hip_build_configuration "${CMAKE_BUILD_TYPE}")/string(TOUPPER _hip_build_configuration "${CMAKE_BUILD_TYPE}")/' /opt/rocm/hip/cmake/FindHIP.cmake
fi
# there is a case-sensitivity issue in the cmake files of some ROCm libraries. Fix this here until the fix is released
sudo sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocsparse/lib/cmake/rocsparse/rocsparse-config.cmake
sudo sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocfft/lib/cmake/rocfft/rocfft-config.cmake
sudo sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/miopen/lib/cmake/miopen/miopen-config.cmake
sudo sed -i 's/find_dependency(hip)/find_dependency(HIP)/g' /opt/rocm/rocblas/lib/cmake/rocblas/rocblas-config.cmake
### END
### BEGIN: CODE FROM https://github.com/pytorch/pytorch/blob/75a2d8e2de4a73e16c3ea22f781673ea3e15a1f9/docker/caffe2/jenkins/common/install_mkl.sh
# Needs https transport for apt
sudo apt-get update
sudo apt-get install -y --no-install-recommends apt-transport-https
# Add Intel MKL repository
key="https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB"
curl "${key}" | sudo apt-key add -
echo 'deb http://apt.repos.intel.com/mkl all main' | \
sudo tee /etc/apt/sources.list.d/intel-mkl.list
sudo apt-get update
# Multiple candidates for intel-mkl-64bit, so have to be specific
sudo apt-get install -y --no-install-recommends intel-mkl-64bit-2019.1-053
sudo rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# Ensure loader can find MKL path
echo '/opt/intel/mkl/lib/intel64' | sudo tee /etc/ld.so.conf.d/intel-mkl.conf
sudo ldconfig
### END
# add usergroup
sudo usermod -a -G video $LOGNAME
echo 'ADD_EXTRA_GROUPS=1' | sudo tee -a /etc/adduser.conf
echo 'EXTRA_GROUPS=video' | sudo tee -a /etc/adduser.conf
# add to path
echo 'export PATH=$PATH:/opt/rocm/bin:/opt/rocm/profiler/bin:/opt/rocm/opencl/bin/x86_64' | sudo tee -a /etc/profile.d/rocm.sh
source /etc/profile.d/rocm.sh
pip3 install tensorflow-rocm
pip3 uninstall -y torch
git clone --recursive https://github.com/pytorch/pytorch/
cd pytorch
# patch stuff
python3 tools/amd_build/build_amd.py
# install!
export HCC_AMDGPU_TARGET=gfx803
USE_ROCM=1 MAX_JOBS=8 python3 setup.py install --user
However, when I run this, the build fails consistently when building caffe2/sgd/hip/fp16_momentum_sgd_op.hip. My full build log is available here.
Any ideas?