TX2 (ARM64) Install for Python 3 - Cont from nVidia Forum

Building from source, and I can’t get Jetson TX2, an ARM64, to compile properly.

I follow the reccommendations from nVidia’s forums here:

I can get it to compile on Python2.7, but not Python3. I had to disable NCCL support, which is desktop only CUDA, and that got me to about 27% of the build complete. You can see the nVidia conversation here: https://devtalk.nvidia.com/default/topic/1042821/pytorch-install-broken/?offset=7#5291321

For now, Pytorch keeps dying at the same spot when compiling Onnx and onnx-tesnsorrt, obviously both of which would be required to deploy for production inferencing on the TX2. Latest errors are:

third_party/onnx-tensorrt/CMakeFiles/nvonnxparser.dir/build.make:134: recipe for target ‘third_party/onnx-tensorrt/CMakeFiles/nvonnxparser.dir/onnx2trt_utils.cpp.o’ failed
make[2]: *** [third_party/onnx-tensorrt/CMakeFiles/nvonnxparser.dir/onnx2trt_utils.cpp.o] Error 1
CMakeFiles/Makefile2:1488: recipe for target ‘third_party/onnx-tensorrt/CMakeFiles/nvonnxparser.dir/all’ failed
make[1]: *** [third_party/onnx-tensorrt/CMakeFiles/nvonnxparser.dir/all] Error 2
[ 27%] Built target python_copy_files
Makefile:160: recipe for target ‘all’ failed
make: *** [all] Error 2
Failed to run ‘bash …/tools/build_pytorch_libs.sh --use-cuda --use-nnpack caffe2 libshm gloo c10d THD’

I’d be open to not compiling it all, and simply using an Onnx exported model, but I have no idea how that would fit into my Python production code. After all, the model is only a part of the PyTorch code, I still use Torch for translating image formats, evaluations, etc.

Not sure where to go from here, but I’d really like to stick with this library and figure out the inferencing for edge devices. Combined with fast.AI’s recent v1.0 release, there’s some pretty amazing work that can be done.

Could you post or upload the complete build log? The error seems to be missing in the current snippet.

It’s a LOT of code, so the log is too long to post here. You can view the read-only text file here:
https://www.dropbox.com/s/5qklhphl4sjdg7a/log.txt?dl=0

Here’s the setup command I ran.
sudo python3 setup.py install

It looks like ONNX has some problems with protobuf.
Did you pull from master before trying to rebuild PyTorch?
If so, could you call git submodule update --init --recursive and try to build again?
I ran into similar issues when some submodules weren’t properly updated.

Ok, so I completely wiped it and followed the standard PyTorch install directions from PyTorch versus the nVidia recommended pytorch_jetson_install.sh from Dusty.

The only thing is I added these changes as per recommended by nVidia since NCCL is only for CUDA desktop GPUs:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index f7b24b728..f75f610ed 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -93,7 +93,7 @@ option(USE_LMDB "Use LMDB" ON)
option(USE_METAL "Use Metal for iOS build" ON)
option(USE_MOBILE_OPENGL "Use OpenGL for mobile code" ON)
option(USE_NATIVE_ARCH "Use -march=native" OFF)
-option(USE_NCCL "Use NCCL" ON)
+option(USE_NCCL "Use NCCL" OFF)
option(USE_SYSTEM_NCCL "Use system-wide NCCL" OFF)
option(USE_NNAPI "Use NNAPI" OFF)
option(USE_NNPACK "Use NNPACK" ON)

diff --git a/setup.py b/setup.py
index 99817f346..e39042b83 100644
--- a/setup.py
+++ b/setup.py
@@ -195,6 +195,7 @@ IS_LINUX = (platform.system() == 'Linux')

BUILD_PYTORCH = check_env_flag('BUILD_PYTORCH')
USE_CUDA_STATIC_LINK = check_env_flag('USE_CUDA_STATIC_LINK')
RERUN_CMAKE = True
+USE_NCCL = False

NUM_JOBS = multiprocessing.cpu_count()
max_jobs = os.getenv("MAX_JOBS")

When I do that, don’t enable TensorRT support, and run the install with Python3 now, it does compile on the fresh install! I do believe I will need the TensorRT support eventually on the TX2, I’ll keep plugging away on that.

For now, however, when trying to import it into a Python shell, I get this:

Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nvidia/pytorch/torch/__init__.py", line 84, in <module>
    from torch._C import *
ImportError: No module named 'torch._C'
>>> exit()

Good to hear you could compile it!

Are you creating the Python shell in your build directory?
If so, could you just switch the directory and try it again?

1 Like

Duh! Thank you!!

So, it compiles without TensorRT support for now.

I think it’s enough to start exploring the transfer of Pytorch models via Onnx on desktop GPUs through TensorRT 3.0 to the Jetson TX2.

Thank you so much for the support here and on Twitter @ptrblck !

1 Like

You are welcome! :slight_smile:
I’m glad it working for now and I’m sure we can manage to build it with TensorRT in the next iteration.

Let me know how your experiments went deploying PyTorch through ONNX!

1 Like

I got it working on AGX Xavier. I’ll take a crack at TX2 and fix any structural problems (and commit patches to master)

2 Likes