Exception with Jetson Xavier NX C++ and CUDA

Hi,
I built a model with python, I can also load it:

  • with python on Jetson
  • with python or C++ on my x86_64 PC.
    With Jetson NX and C++, I get an exception:

Could not run ‘aten::empty_strided’ with arguments from the ‘CUDA’ backend. ‘aten::empty_strided’ is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.7.0/build/aten/src/ATen/CPUType.cpp:2127 [kernel]
BackendSelect: registered at /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.7.0/build/aten/src/ATen/BackendSelectRegister.cpp:761 [kernel]
Named: registered at /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.7.0/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.7.0/torch/csrc/autograd/generated/VariableType_0.cpp:7974 [autograd kernel]
AutogradCPU: registered at /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.7.0/torch/csrc/autograd/generated/VariableType_0.cpp:7974 [autograd kernel]
AutogradCUDA: registered at /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.7.0/torch/csrc/autograd/generated/VariableType_0.cpp:7974 [autograd kernel]
AutogradXLA: registered at /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.7.0/torch/csrc/autograd/generated/VariableType_0.cpp:7974 [autograd kernel]
AutogradPrivateUse1: registered at /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.7.0/torch/csrc/autograd/generated/VariableType_0.cpp:7974 [autograd kernel]
AutogradPrivateUse2: registered at /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.7.0/torch/csrc/autograd/generated/VariableType_0.cpp:7974 [autograd kernel]
AutogradPrivateUse3: registered at /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.7.0/torch/csrc/autograd/generated/VariableType_0.cpp:7974 [autograd kernel]
Tracer: registered at /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.7.0/torch/csrc/autograd/generated/TraceType_0.cpp:9341 [kernel]
Autocast: fallthrough registered at /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.7.0/aten/src/ATen/autocast_mode.cpp:254 [backend fallback]
Batched: registered at /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.7.0/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.7.0/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

Actually, it throws already with this line:
torch::Tensor tensor = at::tensor({ -1, 1 }, at::kCUDA);

at::cuda::is_available() is returning true.

Loading the model with kCPU also works.

I tried with torch 1.6 and 1.7, with the same result.
I link with:
libtorch.so
libtorch_cpu.so
libtorch_cuda.so
libc10.so
libc10_cuda.so

Any idea?