CUDA error: an illegal memory access was encountered problem

Any way to debug this?
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Pytorch:2.7.1, CUDA 11.8

I usually use cuda memcheck, you can use also

cuda-memcheck python your_script.py

Or enable sync reporting

export CUDA_LAUNCH_BLOCKING=1
export TORCH_USE_CUDA_DSA=1
export CUDA_DEVICE_WAITS_ON_EXCEPTION=1

# Then run your script
python your_script.py

cuda-memcheck is deprecated so use compute-sanitizer and disable the caching allocator via: PYTORCH_NO_CUDA_MEMORY_CACHING=1.

1 Like