Clash between PyTorch Python package and system's CUDA? (illegal memory access)

Hi,

I am facing a strange CUDA error in my setup, and I don’t really know what’s happening. What I have in my Python code is:

  • Some PyTorch code,
  • A C++ library using CUDA, with Python wrappers.

When I am using that C++ library in Python alone, it works without any issue. However, if I mix it with PyTorch, I get cudaErrorIllegalAddress: an illegal memory access was encountered in the C++ library.

import torch
import mymodule

# If I use "cpu" for the device: no error
a = torch.randn(1, 1, dtype=torch.float, device="cuda")
# Without this multiplication: no error
b = a @ a

for i in range(100000):
    # After a few iterations: CUDA error raised in there if PyTorch
    # is used above
    obj = mymodule.MyClass()

I am using the precompiled PyTorch packages with CUDA support (e.g. 1.10.2+cu113), but my system has its own CUDA version (11.4) and the C++ library uses it. Can it explain this kind of errors? This comment seems to suggest that it can, but this is not confirmed.

Some other pieces of information:

  • I tested with CUDA_LAUNCH_BLOCKING=1 PYTORCH_NO_CUDA_MEMORY_CACHING=1 but the problem still happens.
  • cudart is linked statically to the C++ library (I’ll try with dynamic linking).
  • cuda-memcheck does not provide more information for the error in the C++ module, but I get this error when starting to use PyTorch:
========= Internal Memcheck Error: Initialization failed
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/lib/x86_64-linux-gnu/libcuda.so.1 [0x24bd9b]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libcudart-a7b20f20.so.11.0 [0x3127c]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libcudart-a7b20f20.so.11.0 [0x1ff4e]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libcudart-a7b20f20.so.11.0 [0x37974]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libcudart-a7b20f20.so.11.0 [0x395aa]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libcudart-a7b20f20.so.11.0 [0x2f32e]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libcudart-a7b20f20.so.11.0 [0x12268]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libcudart-a7b20f20.so.11.0 (cudaMalloc + 0x10c) [0x4a55c]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libc10_cuda.so [0x26874]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so (_ZN2at6native10empty_cudaEN3c108ArrayRefIlEENS1_8optionalINS1_10ScalarTypeEEENS4_INS1_6LayoutEEENS4_INS1_6DeviceEEENS4_IbEENS4_INS1_12MemoryFormatEEE + 0x124) [0x2d605a4]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cu.so [0x25ab39e]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cu.so [0x25ab41a]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so [0x1d1503e]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so (_ZN2at4_ops19empty_memory_format4callEN3c108ArrayRefIlEENS2_8optionalINS2_10ScalarTypeEEENS5_INS2_6LayoutEEENS5_INS2_6DeviceEEENS5_IbEENS5_INS2_12MemoryFormatEEE + 0x1c0) [0x1a1c040]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_python.so (_ZN2at5emptyEN3c108ArrayRefIlEENS0_13TensorOptionsENS0_8optionalINS0_12MemoryFormatEEE + 0xf1) [0xbc47a1]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so (_ZN2at6native5randnEN3c108ArrayRefIlEENS1_8optionalINS_9GeneratorEEENS4_INS1_10ScalarTypeEEENS4_INS1_6LayoutEEENS4_INS1_6DeviceEEENS4_IbEE + 0xe9) [0x1656ce9]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so (_ZN2at6native5randnEN3c108ArrayRefIlEENS1_8optionalINS1_10ScalarTypeEEENS4_INS1_6LayoutEEENS4_INS1_6DeviceEEENS4_IbEE + 0x42) [0x1656e52]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so [0x1eb5ed4]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so [0x1d28954]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so [0x1d18657]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so (_ZN2at4_ops5randn4callEN3c108ArrayRefIlEENS2_8optionalINS2_10ScalarTypeEEENS5_INS2_6LayoutEEENS5_INS2_6DeviceEEENS5_IbEE + 0x19e) [0x193bcde]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_python.so [0x8b1816]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/bin/python [0x2259b3]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/bin/python (_PyObject_MakeTpCall + 0x94) [0x73e94]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/bin/python (_PyEval_EvalFrameDefault + 0x6129) [0x62e79]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/bin/python [0x5bdab]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/bin/python (_PyEval_EvalFrameDefault + 0x60e7) [0x62e37]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/bin/python [0x125f0a]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/bin/python (PyEval_EvalCode + 0x3a) [0x12623a]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/bin/python [0x166f37]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/bin/python (PyRun_FileExFlags + 0xb3) [0x168e83]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/bin/python (PyRun_SimpleFileExFlags + 0xff) [0x16901f]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/bin/python [0x670df]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/bin/python (Py_BytesMain + 0x6f) [0x676ff]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 [0x2dfd0]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0x7d) [0x2e07d]
=========     Host Frame:/home/user/.pyenv/versions/env-3.9.1/bin/python (_start + 0x2e) [0x6630e]
=========
  • I also got this backtrace when PyTorch cleans up its memory:
terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: an illegal memory access was encountered
Exception raised from uncached_delete at ../c10/cuda/CUDACachingAllocator.cpp:1460 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f466d235d62 in /home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1bdbe (0x7f466d497dbe in /home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0xa4 (0x7f466d21f314 in /home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x29ee09 (0x7f46222c6e09 in /home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0xadfdf1 (0x7f4622b07df1 in /home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #5: THPVariable_subclass_dealloc(_object*) + 0x292 (0x7f4622b080f2 in /home/user/.pyenv/versions/env-3.9.1/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x5bfe5 (0x5640a5bc8fe5 in /home/user/.pyenv/versions/env-3.9.1/bin/python)
frame #7: <unknown function> + 0x1772a5 (0x5640a5ce42a5 in /home/user/.pyenv/versions/env-3.9.1/bin/python)
frame #8: <unknown function> + 0x1772bd (0x5640a5ce42bd in /home/user/.pyenv/versions/env-3.9.1/bin/python)
frame #9: <unknown function> + 0xa0f55 (0x5640a5c0df55 in /home/user/.pyenv/versions/env-3.9.1/bin/python)
frame #10: PyDict_SetItemString + 0x96 (0x5640a5c12e66 in /home/user/.pyenv/versions/env-3.9.1/bin/python)
frame #11: <unknown function> + 0x14827f (0x5640a5cb527f in /home/user/.pyenv/versions/env-3.9.1/bin/python)
frame #12: <unknown function> + 0x160c65 (0x5640a5ccdc65 in /home/user/.pyenv/versions/env-3.9.1/bin/python)
frame #13: Py_BytesMain + 0x74 (0x5640a5bd4704 in /home/user/.pyenv/versions/env-3.9.1/bin/python)
frame #14: <unknown function> + 0x2dfd0 (0x7f468b22cfd0 in /lib/x86_64-linux-gnu/libc.so.6)
frame #15: __libc_start_main + 0x7d (0x7f468b22d07d in /lib/x86_64-linux-gnu/libc.so.6)
frame #16: _start + 0x2e (0x5640a5bd330e in /home/user/.pyenv/versions/env-3.9.1/bin/python)

========= Error: process didn't terminate successfully
========= Fatal UVM GPU fault of type invalid pde due to invalid address
=========     during read access to address 0x7f47794fc000
=========
========= Fatal UVM GPU fault of type invalid pde due to invalid address
=========     during read access to address 0x7f478b9f7000
=========
========= Fatal UVM GPU fault of type invalid pde due to invalid address
=========     during read access to address 0x7f4d90735000
=========
========= Fatal UVM GPU fault of type invalid pde due to invalid address
=========     during read access to address 0x7f4d92f85000
=========
========= Fatal UVM GPU fault of type invalid pde due to invalid address
=========     during read access to address 0x7f4d97c83000
=========
========= Fatal UVM GPU fault of type invalid pde due to invalid address
=========     during read access to address 0x7f4d99adf000
=========
========= Fatal UVM GPU fault of type invalid pde due to invalid address
=========     during read access to address 0x7f478b9f7000
=========
========= Fatal UVM GPU fault of type invalid pde due to invalid address
=========     during read access to address 0x7f4d90735000
=========
========= Fatal UVM GPU fault of type invalid pde due to invalid address
=========     during read access to address 0x7f4d92f85000
=========
========= Fatal UVM GPU fault of type invalid pde due to invalid address
=========     during read access to address 0x7f4d97c83000
=========
========= Fatal UVM GPU fault of type invalid pde due to invalid address
=========     during read access to address 0x7f4d99adf000
=========
========= No CUDA-MEMCHECK results found

I will try to make a full repro code soon, but in the meantime, if this is a known problem/limitation, feel free to let me know :slight_smile:

Try to run your code with cuda-gdb and check the backtrace once you hit the illegal memory access.
As described in the linked post, rarely it could be related to the setup and the majority of these issues are caused by wrong code.

Alas I already tried cuda-gdb but it did not bring more information. The error is raised by the OptiX library used by the C++ library, so all I can know is that something wrong happens with the device memory.

I made another test where I have 2 GPUs: PyTorch uses the first one, and the C++ library uses the second one. This seems to be solving the problem in that toy example, but does not clearly explains what’s happening here.

Are you using Unified Memory in OptiX? Based on the error message it seems the IMA is caused in an UVM page fault. Could you disable it and rerun the test?

AFAICS, UVM is not supported in OptiX, as explained here (I had the exact same error when I tried). I’m using OptiX 7.3 with plain old cudaMalloc for the allocations. The optixAccelBuild function where the error happens is poorly documented, so it has been a lot of trials and errors. We basically provide the memory for OptiX to build its acceleration structures, and it starts by doing some sanity checks on it.

The next thing I tried was to build on-the-fly the C++ library as a PyTorch extension, in order to eliminate potential build issues, but it did not change the problem.
Then I figured that I might as well try to use the PyTorch CUDA allocator (c10::cuda::CUDACachingAllocator::raw_alloc() and c10::cuda::CUDACachingAllocator::raw_delete()), instead of direct cudaMalloc()/cudaFree(), and the errors seem to be gone, but if I use PYTORCH_NO_CUDA_MEMORY_CACHING=1 to prevent caching allocations on the Python side.

EDIT: I also tried c10::cuda::CUDACachingAllocator::get()->allocate() in the C++ code, but I had errors as well.