PyTorch Build from Source Power8/ppc64le architecture

I have a Power8 machine with 4 Nividia P100. As there is no official binaries for ppc64le architecture, I build from source. However, I get the following error:

[ 96%] Building CXX object src/ATen/test/CMakeFiles/basic.dir/basic.cpp.o
[ 96%] Building CXX object src/ATen/test/CMakeFiles/scalar_tensor_test.dir/scalar_tensor_test.cpp.o
[ 97%] Building CXX object src/ATen/test/CMakeFiles/native_test.dir/native_test.cpp.o
[ 97%] Building CXX object src/ATen/test/CMakeFiles/scalar_test.dir/scalar_test.cpp.o
[ 98%] Linking CXX executable wrapdim_test
[ 98%] Linking CXX executable dlconvertor_test
[ 98%] Linking CXX executable undefined_tensor_test
[ 98%] Linking CXX executable atest
…/libATen.so.1: undefined reference to cudnnSetConvolutionGroupCount' ../libATen.so.1: undefined reference tocudnnSetConvolutionMathType’
collect2: error: ld returned 1 exit status
src/ATen/test/CMakeFiles/wrapdim_test.dir/build.make:104: recipe for target ‘src/ATen/test/wrapdim_test’ failed
make[2]: *** [src/ATen/test/wrapdim_test] Error 1
CMakeFiles/Makefile2:445: recipe for target ‘src/ATen/test/CMakeFiles/wrapdim_test.dir/all’ failed
make[1]: *** [src/ATen/test/CMakeFiles/wrapdim_test.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs…
…/libATen.so.1: undefined reference to cudnnSetConvolutionGroupCount' ../libATen.so.1: undefined reference tocudnnSetConvolutionMathType’
…/libATen.so.1: undefined reference to cudnnSetConvolutionGroupCount' ../libATen.so.1: undefined reference tocudnnSetConvolutionMathType’
…/libATen.so.1: undefined reference to cudnnSetConvolutionGroupCount' ../libATen.so.1: undefined reference tocudnnSetConvolutionMathType’
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
src/ATen/test/CMakeFiles/undefined_tensor_test.dir/build.make:104: recipe for target ‘src/ATen/test/undefined_tensor_test’ failed
make[2]: *** [src/ATen/test/undefined_tensor_test] Error 1
src/ATen/test/CMakeFiles/atest.dir/build.make:104: recipe for target ‘src/ATen/test/atest’ failed
make[2]: *** [src/ATen/test/atest] Error 1
CMakeFiles/Makefile2:593: recipe for target ‘src/ATen/test/CMakeFiles/undefined_tensor_test.dir/all’ failed
make[1]: *** [src/ATen/test/CMakeFiles/undefined_tensor_test.dir/all] Error 2
CMakeFiles/Makefile2:334: recipe for target ‘src/ATen/test/CMakeFiles/atest.dir/all’ failed
make[1]: *** [src/ATen/test/CMakeFiles/atest.dir/all] Error 2
src/ATen/test/CMakeFiles/dlconvertor_test.dir/build.make:104: recipe for target ‘src/ATen/test/dlconvertor_test’ failed
make[2]: *** [src/ATen/test/dlconvertor_test] Error 1
CMakeFiles/Makefile2:482: recipe for target ‘src/ATen/test/CMakeFiles/dlconvertor_test.dir/all’ failed
make[1]: *** [src/ATen/test/CMakeFiles/dlconvertor_test.dir/all] Error 2
[100%] Linking CXX executable scalar_tensor_test
[100%] Linking CXX executable broadcast_test
[100%] Linking CXX executable scalar_test
[100%] Linking CXX executable basic
…/libATen.so.1: undefined reference to cudnnSetConvolutionGroupCount' ../libATen.so.1: undefined reference tocudnnSetConvolutionMathType’
collect2: error: ld returned 1 exit status
src/ATen/test/CMakeFiles/broadcast_test.dir/build.make:104: recipe for target ‘src/ATen/test/broadcast_test’ failed
make[2]: *** [src/ATen/test/broadcast_test] Error 1
CMakeFiles/Makefile2:408: recipe for target ‘src/ATen/test/CMakeFiles/broadcast_test.dir/all’ failed
make[1]: *** [src/ATen/test/CMakeFiles/broadcast_test.dir/all] Error 2
…/libATen.so.1: undefined reference to cudnnSetConvolutionGroupCount' ../libATen.so.1: undefined reference tocudnnSetConvolutionMathType’
collect2: error: ld returned 1 exit status
src/ATen/test/CMakeFiles/scalar_tensor_test.dir/build.make:104: recipe for target ‘src/ATen/test/scalar_tensor_test’ failed
make[2]: *** [src/ATen/test/scalar_tensor_test] Error 1
CMakeFiles/Makefile2:519: recipe for target ‘src/ATen/test/CMakeFiles/scalar_tensor_test.dir/all’ failed
make[1]: *** [src/ATen/test/CMakeFiles/scalar_tensor_test.dir/all] Error 2
…/libATen.so.1: undefined reference to cudnnSetConvolutionGroupCount' ../libATen.so.1: undefined reference tocudnnSetConvolutionMathType’
collect2: error: ld returned 1 exit status
src/ATen/test/CMakeFiles/scalar_test.dir/build.make:104: recipe for target ‘src/ATen/test/scalar_test’ failed
make[2]: *** [src/ATen/test/scalar_test] Error 1
CMakeFiles/Makefile2:556: recipe for target ‘src/ATen/test/CMakeFiles/scalar_test.dir/all’ failed
make[1]: *** [src/ATen/test/CMakeFiles/scalar_test.dir/all] Error 2
[100%] Linking CXX executable native_test
…/libATen.so.1: undefined reference to cudnnSetConvolutionGroupCount' ../libATen.so.1: undefined reference tocudnnSetConvolutionMathType’
collect2: error: ld returned 1 exit status
src/ATen/test/CMakeFiles/basic.dir/build.make:104: recipe for target ‘src/ATen/test/basic’ failed
make[2]: *** [src/ATen/test/basic] Error 1
CMakeFiles/Makefile2:297: recipe for target ‘src/ATen/test/CMakeFiles/basic.dir/all’ failed
make[1]: *** [src/ATen/test/CMakeFiles/basic.dir/all] Error 2
…/libATen.so.1: undefined reference to cudnnSetConvolutionGroupCount' ../libATen.so.1: undefined reference tocudnnSetConvolutionMathType’
collect2: error: ld returned 1 exit status
src/ATen/test/CMakeFiles/native_test.dir/build.make:104: recipe for target ‘src/ATen/test/native_test’ failed
make[2]: *** [src/ATen/test/native_test] Error 1
CMakeFiles/Makefile2:371: recipe for target ‘src/ATen/test/CMakeFiles/native_test.dir/all’ failed
make[1]: *** [src/ATen/test/CMakeFiles/native_test.dir/all] Error 2
Makefile:127: recipe for target ‘all’ failed
make: *** [all] Error 2

Seems that it found CUDNN when compiling but can’t link it. That’s weird. Do you have CUDNN installed?

1 Like

This issue may helps you. https://github.com/pytorch/pytorch/issues/3567#issuecomment-342895938

  1. conda uninstall cudnn
  2. recompiling.

Thank you very much for your replies. I do not have sudo rights in that machine. However, it is apparently due to cuDNN’s being incompatible with CUDA. I asked admin to uninstall CUDA/cuDNN and reinstalled: CuDA 8.0 and “cuDNN v7.0.5 Library for Linux (Power8)”. Now, PyTorch is compiling w/o errors in Anaconda 2 and working.

@osuemer could you tell me which version of pytorch you used and also the gcc version?
I am also facing troubles compiling pytorch on the IBM power8 machine (with 4 p100 gpus). I get a different error from yours

[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THC/generated/ATen_generated_THCTensorMathCompareTByte.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THC/generated/ATen_generated_THCTensorMathCompareTLong.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THC/generated/ATen_generated_THCTensorMathCompareChar.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THC/generated/ATen_generated_THCTensorMaskedLong.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THC/generated/ATen_generated_THCTensorMathCompareTFloat.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THC/generated/ATen_generated_THCTensorMathCompareHalf.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THC/generated/ATen_generated_THCTensorMaskedFloat.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THC/ATen_generated_THCHalf.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THC/generated/ATen_generated_THCTensorMathPointwiseDouble.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_BCECriterion.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_HardTanh.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_LeakyReLU.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_ELU.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_LookupTable.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_MultiMarginCriterion.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_SmoothL1Criterion.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_SoftShrink.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_SpatialAveragePooling.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_SpatialFractionalMaxPooling.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_SpatialSubSampling.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_SpatialCrossMapLRN.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_SpatialMaxPooling.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_VolumetricAdaptiveMaxPooling.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_TemporalReflectionPadding.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_Square.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_VolumetricGridSamplerBilinear.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_VolumetricUpSamplingNearest.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/__/THCUNN/ATen_generated_VolumetricDilatedMaxPooling.cu.o
[ 52%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/cuda/detail/ATen_generated_IndexUtils.cu.o
sh: 1: Cannot fork
CMake Error at ATen_generated_THCTensorConv.cu.o.cmake:207 (message):
  Error generating
  /home/sathap1/Software/pytorch/torch/lib/build/aten/src/ATen/CMakeFiles/ATen.dir/__/THC/./ATen_generated_THCTensorConv.cu.o

src/ATen/CMakeFiles/ATen.dir/build.make:161: recipe for target 'src/ATen/CMakeFiles/ATen.dir/__/THC/ATen_generated_THCTensorConv.cu.o' failed
make[2]: *** [src/ATen/CMakeFiles/ATen.dir/__/THC/ATen_generated_THCTensorConv.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
slurmstepd: error: get_exit_code task 0 died by signal

I use gcc 5.4.
gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609