RuntimeError: CUDA error -- Fine Tune unsloth model

hzhuang · May 18, 2024, 10:20am

Hi everyone,

I used the fine-tune model from unsloth, and I got this error. It ran ok for the first few loop, but it show up this errors after 2 mins.

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

also got this warning:
…/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [0,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.

Code that show up this error:

inputs = tokenizer(
[
alpaca_prompt.format(
instruction, # instruction
tables[table_i], # input
“”, # output - leave this blank for generation!
)
], return_tensors=“pt”).to(“cuda:0”)

ptrblck · May 18, 2024, 12:39pm

An indexing operation is failing. Rerun your code via CUDA_LAUNCH_BLOCKING=1 python script.py args to isolate the failing line of code.

hzhuang · May 18, 2024, 4:17pm

Dear Ptrblck,

Thanks for your reply, after I ran this via CUDA_LAUNCH_BLOCKING=1, still got this:

File ~/anaconda3/envs/FTllama/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:800, in BatchEncoding.to(self, device)
796 # This check catches things like APEX blindly calling “to” on all inputs to a module
797 # Otherwise it passes the casts down and casts the LongTensor containing the token idxs
798 # into a HalfTensor
799 if isinstance(device, str) or is_torch_device(device) or isinstance(device, int):
→ 800 self.data = {k: v.to(device=device) for k, v in self.data.items()}
801 else:
802 logger.warning(f"Attempting to cast a BatchEncoding to type {str(device)}. This is not supported.")

File ~/anaconda3/envs/FTllama/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:800, in (.0)
796 # This check catches things like APEX blindly calling “to” on all inputs to a module
797 # Otherwise it passes the casts down and casts the LongTensor containing the token idxs
798 # into a HalfTensor
799 if isinstance(device, str) or is_torch_device(device) or isinstance(device, int):
→ 800 self.data = {k: v.to(device=device) for k, v in self.data.items()}
801 else:
802 logger.warning(f"Attempting to cast a BatchEncoding to type {str(device)}. This is not supported.")

RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

ptrblck · May 18, 2024, 6:33pm

The stacktrace is unfortunately still wrong. Did you export this env variable in your terminal? If so, check if any embedding layers are used as their input is often containing indices which are out of the valid range.

hzhuang · May 19, 2024, 2:15am

Dear ptrblck,

Thanks, notice that my input is out of the valid range

RuntimeError: CUDA error -- Fine Tune unsloth model

I used the fine-tune model from unsloth, and I got this error. It ran ok for the first few loop, but it show up this errors after 2 mins.

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.