Specific Aten object causes pytorch build to hang and then fail on raspberry pi zero

Hi,

I am building Pytorch 1.7 from source on a raspberry pi zero (raspbian-lite: kernel version 4.19) and the build is hanging at the same spot every single time nearly indefinitely. Here are my steps:

sudo apt install libopenblas-dev libblas-dev m4 cmake cython python3-dev python3-yaml python3-setuptools python3-wheel python3-pillow python3-numpy git

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch

export USE_CUDA=0
export USE_CUDNN=0
export BUILD_TEST=0
export USE_MKLDNN=0
export USE_DISTRIBUTED=0
export USE_NNPACK=0

(I have tried not setting these as well and I still get stuck at the same spot so I don’t think it’s this.I have also tried to run python3 setup.py clean prior and still no luck)

python3 setup.py install --verbose

It always hangs indefinitely (12+ hours) at this line Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Functions.cpp.o

or it hangs there and produces this error:

c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
make[2]: *** [caffe2/CMakeFiles/torch_cpu.dir/build.make:3632: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Functions.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1558: caffe2/CMakeFiles/torch_cpu.dir/all] Error 2
make: *** [Makefile:141: all] Error 2
Traceback (most recent call last):
File “setup.py”, line 737, in
build_deps()
File “setup.py”, line 321, in build_deps
cmake=cmake)
File “/home/pi/pytorch/tools/build_pytorch_libs.py”, line 62, in build_caffe2
cmake.build(my_env)
File “/home/pi/pytorch/tools/setup_helpers/cmake.py”, line 345, in build
self.run(build_args, my_env)
File “/home/pi/pytorch/tools/setup_helpers/cmake.py”, line 141, in run
check_call(command, cwd=self.build_dir, env=env)
File “/usr/lib/python3.7/subprocess.py”, line 347, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘cmake’, ‘–build’, ‘.’, ‘–target’, ‘install’, ‘–config’, ‘Release’, ‘–’, ‘-j’, ‘1’]’ returned non-zero exit status 2.

Does anyone have an idea what could be causing this and what I could do to resolve?

Any help would be greatly appreciated!

Could you check the memory usage on your RPi and check, if you might be running out of memory while compiling on the device?
If that’s the case, you might need to add a larger swap and rerun the build.

The last time I’ve compiled PyTorch on my RPi ver1, I needed to increase the swap by quite a bit and I think it took approx. a day to finish the compilation on the device (it was the first RPi with a single core).