Hello,
I have two GPUs and during training, I’m getting below exception.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/cuda/IndexKernel.cu:92: operator(): block: [98,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
trainer.train()
File "/python3.12/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
I’m using Trainer from Transformers Library.
model.is_parallelizable = True
model.model_parallel = True
Can someone please help me with this?
Thanks.