RuntimeError: CUDA error: invalid argument

Hi, I’m also having the same problem on NVIDIA Jetson AGX Orin 64GB while using PyTorch 2.5.0a0+872d972e41.nv24.08 and JetPack 6.0 (but inside a JetPack 6.1-based docker container). My workload is related to some LLM/VLM application but the error is pretty much the same:

Process EmbeddingProcess-2:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/opt/nvidia/myapp/process_base.py", line 188, in run
    item = self._queue.get()
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/reductions.py", line 149, in rebuild_cuda_tensor
    storage = storage_cls._new_shared_cuda(
  File "/usr/local/lib/python3.10/dist-packages/torch/storage.py", line 1420, in _new_shared_cuda
    return torch.UntypedStorage._new_shared_cuda(*args, **kwargs)
RuntimeError: CUDA error: invalid argument
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
1 Like