Hi, I’m also having the same problem on NVIDIA Jetson AGX Orin 64GB while using PyTorch 2.5.0a0+872d972e41.nv24.08 and JetPack 6.0 (but inside a JetPack 6.1-based docker container). My workload is related to some LLM/VLM application but the error is pretty much the same:
Process EmbeddingProcess-2:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/nvidia/myapp/process_base.py", line 188, in run
item = self._queue.get()
File "/usr/lib/python3.10/multiprocessing/queues.py", line 122, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/reductions.py", line 149, in rebuild_cuda_tensor
storage = storage_cls._new_shared_cuda(
File "/usr/local/lib/python3.10/dist-packages/torch/storage.py", line 1420, in _new_shared_cuda
return torch.UntypedStorage._new_shared_cuda(*args, **kwargs)
RuntimeError: CUDA error: invalid argument
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.