Training on Multiple GPU with Transformers Library

Hello,

I have two GPUs and during training, I’m getting below exception.

/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.

trainer.train()
  File "/python3.12/site-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^

I’m using Trainer from Transformers Library.

model.is_parallelizable = True
model.model_parallel = True

Can someone please help me with this?

Thanks.

The error points to an indexing error which is most likely caused by an embedding receiving an input containing out-of-bounds values.

Thanks for the reply @ptrblck.

How can I track and resolve this issue/

Thanks.

Rerun your code with blocking launches via CUDA_LAUNCH_BLOCKING=1 as described in your error message (which is missing here) and isolate the failing operation. Once done, check the inputs and the min/max values and make sure they are valid.