Intermittent NvMapMemAlloc error 12 and CUDA allocator crash during PyTorch inference on Jetson Orin Nano

Hi,

I’m running a PyTorch YOLO-based inference on a Jetson Orin Nano Super, and I frequently get these errors (not always, but randomly):

NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
Error : NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/opt/pytorch/pytorch/c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch.

I tried the following, but the issue still occurs:

  • with torch.no_grad() during inference

  • os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'

  • Full cleanup using torch.cuda.empty_cache(), gc.collect(), and reloading the model

The error isn’t always caught by try/except and sometimes crashes the process.

Setup:

  • Jetson Orin Nano

  • JetPack 6.2.1

  • PyTorch (from NVIDIA SDK): 2.5.0a0+872d972e41.nv24.08

  • Model: YOLO (tracking mode)

Questions:

  1. Is this a PyTorch issue or a Jetson memory allocator issue (NvMapMemAlloc)?

  2. Any known fix or configuration to prevent this intermittent error?

Thanks in advance for any suggestions.

Are you running out of memory? Also, which build are you using?

Hi,

I’m having the same issue.

Error:

NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0

Error: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/opt/pytorch/c10/cuda/CUDACachingAllocator.cpp":1131, please report a bug to PyTorch. 

System Information:

Jetson Model: NVIDIA Jetson Orin Nano Engineering Reference Developer Kit SuperLinux, Ubuntu 22.04.5 LTS

CPU Model: ARMv8 Processor rev 1 (v8l)CPU Cores: 6Architecture:                       aarch64

CUDA Compiler: Cuda compilation tools, release 12.6, V12.6.68

cuDNN Version:CUDNN_MAJOR 9CUDNN_MINOR 3CUDNN_PATCHLEVEL 0CUDNN_VERSION (CUDNN_MAJOR

TensorRT:libnvinfer-bin 10.3.0.30-1+cuda12.5libnvinfer-dev 10.3.0.30-1+cuda12.5libnvinfer-dispatch-dev 10.3.0.30-1+cuda12.5

Python Version: 3.10.12

PyTorch Version: 2.8.0PyTorch CUDA Available: TruePyTorch CUDA Version: 12.6PyTorch cuDNN Version: 90300PyTorch cuDNN Enabled: True

GPU Count: 1

GPU 0:Name: OrinCompute Capability: (8, 7)Total Memory: 7.44 GBMulti Processor Count: 8

TorchVision Version: 0.23.0

Any help or workarounds would be greatly appreciated.

Thanks in advance!

Thanks for the response.

I’ve monitored GPU and system memory in real time using jtop, and it doesn’t appear to be a straightforward out-of-memory condition; the crash occurs even when there’s available GPU memory.

Here are my setup details:

  • Device: Jetson Orin Nano Super (8 GB RAM)

  • JetPack: 6.2.1

  • PyTorch: 2.5.0a0+872d972e41.nv24.8

  • CUDA: 12.6

  • TorchVision: 0.19.1

  • RAM: 8 GB

  • Storage: 60 GB SD card + 1 TB SSD

  • Swap Memory: 25 GB

I am having the same issue. It’s intermittent and I fairly sure the GPU memory is not full.

Collecting environment information…
PyTorch version: 2.4.0
Is debug build: False
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.30.4
Libc version: glibc-2.35

Python version: 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.148-tegra-aarch64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.6.68
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Orin (nvgpu)
Nvidia driver version: 540.4.0
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_engines_precompiled.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_graph.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_heuristic.so.9.4.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops.so.9.4.0
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 6
On-line CPU(s) list: 0-5
Vendor ID: ARM
Model name: Cortex-A78AE
Model: 1
Thread(s) per core: 1
Core(s) per cluster: 3
Socket(s): -
Cluster(s): 2
Stepping: r0p1
CPU max MHz: 1728.0000
CPU min MHz: 115.2000
BogoMIPS: 62.50
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm paca pacg
L1d cache: 384 KiB (6 instances)
L1i cache: 384 KiB (6 instances)
L2 cache: 1.5 MiB (6 instances)
L3 cache: 4 MiB (2 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-5
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, but not BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] onnx==1.16.2
[pip3] onnx-graphsurgeon==0.5.2
[pip3] torch==2.4.0
[pip3] torch2trt==0.5.0
[pip3] torchaudio==2.4.0a0+69d4077
[pip3] torchvision==0.19.0a0+48b1edf
[conda] Could not collect

same issue. @ptrblck kind reminder. :slight_smile:

kind remainder @ptrblck