Hello everyone,
I send you this post to see if anyone can help me in compiling torch. It is very strange but when I compile pytorch without the WITH_DISTRIBUTED = 1
parameter the compilation seems to go well. But when I put the parameter WITH_DISTRIBUTED = 1 python3 setup.py build_deps
gives me the following error that I attach below. I need this flag enabled for parallelization.
I am using a debian 8.6.0, cmake 3.7.0, and python-3.5.2 and I’m really very lost …
THE ERROR:
/usr/include/string.h:66:14: note: ‘memset’
extern void *memset (void *__s, int __c, size_t __n) __THROW __nonnull ((1));
^
CMakeFiles/THD.dir/build.make:398: recipe for target 'CMakeFiles/THD.dir/master_worker/master/THDStorage.cpp.o' failed
make[2]: *** [CMakeFiles/THD.dir/master_worker/master/THDStorage.cpp.o] Error 1
/soft/pytorch-dist/torch/lib/THD/base/data_channels/DataChannelMPI.cpp: In member function ‘virtual void thd::DataChannelMPI::send(const thd::Scalar&, int)’:
/soft/pytorch-dist/torch/lib/THD/base/data_channels/DataChannelMPI.cpp:329:52: error: invalid conversion from ‘const void*’ to ‘void*’ [-fpermissive]
MPI_UINT8_T, dst_rank, 0, MPI_COMM_WORLD);
^
In file included from /soft/pytorch-dist/torch/lib/THD/base/data_channels/DataChannelMPI.hpp:5:0,
from /soft/pytorch-dist/torch/lib/THD/base/data_channels/DataChannelMPI.cpp:1:
/usr/lib/openmpi/include/mpi.h:1384:20: note: initializing argument 1 of ‘int MPI_Send(void*, int, MPI_Datatype, int, int, MPI_Comm)’
OMPI_DECLSPEC int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest,
^
/soft/pytorch-dist/torch/lib/THD/base/data_channels/DataChannelMPI.cpp: In member function ‘virtual void thd::DataChannelMPI::send(thpp::Tensor&, int)’:
/soft/pytorch-dist/torch/lib/THD/base/data_channels/DataChannelMPI.cpp:340:52: error: invalid conversion from ‘const void*’ to ‘void*’ [-fpermissive]
MPI_UINT8_T, dst_rank, 0, MPI_COMM_WORLD);
^
In file included from /soft/pytorch-dist/torch/lib/THD/base/data_channels/DataChannelMPI.hpp:5:0,
from /soft/pytorch-dist/torch/lib/THD/base/data_channels/DataChannelMPI.cpp:1:
/usr/lib/openmpi/include/mpi.h:1384:20: note: initializing argument 1 of ‘int MPI_Send(void*, int, MPI_Datatype, int, int, MPI_Comm)’
OMPI_DECLSPEC int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest,
^
/soft/pytorch-dist/torch/lib/THD/base/data_channels/DataChannelMPI.cpp: In member function ‘virtual THDGroup thd::DataChannelMPI::newGroup(const std::vector<int>&)’:
/soft/pytorch-dist/torch/lib/THD/base/data_channels/DataChannelMPI.cpp:476:56: error: invalid conversion from ‘const int*’ to ‘int*’ [-fpermissive]
MPI_Group_incl(world_group, ranks.size(), ranks.data(), &ranks_group);
^
In file included from /soft/pytorch-dist/torch/lib/THD/base/data_channels/DataChannelMPI.hpp:5:0,
from /soft/pytorch-dist/torch/lib/THD/base/data_channels/DataChannelMPI.cpp:1:
/usr/lib/openmpi/include/mpi.h:1269:20: note: initializing argument 3 of ‘int MPI_Group_incl(MPI_Group, int, int*, ompi_group_t**)’
OMPI_DECLSPEC int MPI_Group_incl(MPI_Group group, int n, int *ranks,
^
/soft/pytorch-dist/torch/lib/THD/base/data_channels/DataChannelMPI.cpp:479:66: error: ‘MPI_Comm_create_group’ was not declared in this scope
MPI_Comm_create_group(MPI_COMM_WORLD, ranks_group, 0, &new_comm);
^
CMakeFiles/THD.dir/build.make:422: recipe for target 'CMakeFiles/THD.dir/master_worker/master/THDTensor.cpp.o' failed
make[2]: *** [CMakeFiles/THD.dir/master_worker/master/THDTensor.cpp.o] Error 1
CMakeFiles/THD.dir/build.make:158: recipe for target 'CMakeFiles/THD.dir/base/data_channels/DataChannelMPI.cpp.o' failed
make[2]: *** [CMakeFiles/THD.dir/base/data_channels/DataChannelMPI.cpp.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/THD.dir/all' failed
make[1]: *** [CMakeFiles/THD.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2
I follow this instructions:
Using Python 3 (Python 3.4)
Install build dependencies
Essentials
sudo apt-get update
sudo apt-get install git build-essential
ccache
sudo apt-get install ccache
export CC="ccache gcc"
export CXX="ccache g++"
CMake
The default CMake version in Debian’s repositories is too old.
Ubuntu 16.10 has version 3.5.2 and it works fine.
wget https://cmake.org/files/v3.7/cmake-3.7.0.tar.gz
tar xf cmake-3.7.0.tar.gz
rm cmake-3.7.0.tar.gz
cd cmake-3.7.0
./bootstrap
make
sudo make install
cd ..
Install THD dependencies
Asio C++ Library
sudo apt-get install libasio-dev
MPI implementation
sudo apt-get install mpich
Set up Python
sudo apt-get install python3-dev python3-pip
Set up virtual environment
sudo pip3 install virtualenv
virtualenv venv
source venv/bin/activate
Install PyTorch
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$HOME/pytorch-dist/torch/lib"
git clone https://github.com/apaszke/pytorch-dist/
cd pytorch-dist
pip3 install -r requirements.txt
WITH_DISTRIBUTED=1 python3 setup.py build_deps
WITH_DISTRIBUTED=1 python3 setup.py develop
Thanks a lot for your help.
Dani