RuntimeError: CUDA error: device-side assert triggered with Llama2

nitay_shlump · August 18, 2025, 7:04am

I passed “pad_token_id=tokenizer.pad_token_id“ in model.generate and now I got only this error:

/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:110: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I think this issue might happen because I am using BitsAndBytesConfig, but I don’t want the generation to take too long without BitsAndBytesConfig.

Edit:
It resolved when I passed “pad_token_id=tokenizer.eos_token_id“ and replaced device_map=’auto’ to device_map={“”: 0} when seting the LLM.

Other solutions are found in the model github: RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 · Issue #380 · meta-llama/llama · GitHub