[solved] Assertion `srcIndex < srcSelectDimSize` failed on GPU for `torch.cat()`

Has anyone found a solution by chance? I get the same error when launching a training from scratch of huggingface models Roberta and BERT (transformers/examples/language-modeling at master · huggingface/transformers · GitHub). I received many and many of this errors

/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [372,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

Then the stack trace:

Traceback (most recent call last):
  File "/data/medioli/transformers/examples/language-modeling/run_mlm.py", line 491, in <module>
    main()
  File "/data/medioli/transformers/examples/language-modeling/run_mlm.py", line 457, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/data/medioli/env/lib/python3.6/site-packages/transformers/trainer.py", line 1053, in train
    tr_loss += self.training_step(model, inputs)
  File "/data/medioli/env/lib/python3.6/site-packages/transformers/trainer.py", line 1443, in training_step
    loss = self.compute_loss(model, inputs)
  File "/data/medioli/env/lib/python3.6/site-packages/transformers/trainer.py", line 1475, in compute_loss
    outputs = model(**inputs)
  File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/parallel/distributed.py", line 511, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/medioli/env/lib/python3.6/site-packages/transformers/models/roberta/modeling_roberta.py", line 1057, in forward
    return_dict=return_dict,
  File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/medioli/env/lib/python3.6/site-packages/transformers/models/roberta/modeling_roberta.py", line 810, in forward
    past_key_values_length=past_key_values_length,
  File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/medioli/env/lib/python3.6/site-packages/transformers/models/roberta/modeling_roberta.py", line 123, in forward
    embeddings += position_embeddings
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: device-side assert triggered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fa4517ed1e2 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7fa451a3bf92 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fa4517db9cd in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libc10.so)
frame #3: std::vector<c10d::Reducer::Bucket, std::allocator<c10d::Reducer::Bucket> >::~vector() + 0x25a (0x7fa427f8489a in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: c10d::Reducer::~Reducer() + 0x28a (0x7fa427f79b1a in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #5: std::_Sp_counted_ptr<c10d::Reducer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x12 (0x7fa427f593c2 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #6: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x46 (0x7fa4277577a6 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0xa6b08b (0x7fa427f5a08b in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0x273c00 (0x7fa427762c00 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x274e4e (0x7fa427763e4e in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #22: main + 0x16e (0x400a3e in /data/medioli/env/bin/python3)
frame #23: __libc_start_main + 0xf5 (0x7fa48f4903d5 in /lib64/libc.so.6)
frame #24: /data/medioli/env/bin/python3() [0x400b02]

Hi smth, I found another bug related to this.
If I define two tensors in jupyter notebook, like

a = torch.randn(2,3)
b=torch.tensor([2,3])

where b is out of the index of a.
If I input and run a[b] in a new cell of this notebook , the error in such topic will appear.
However, when I define a new tensor c like this:

c = torch.tensor([3,3])
c = c.cuda()

the same error will appear again like RuntimeError: CUDA error: device-side assert triggered
Could you tell me how to deal with that?
Thank you!

Your index tensor contains out-of-bounds values as PyTorch tensors use a 0-based index. Once you are hitting a sticky CUDA error, the CUDA context will be corrupted and you would need to reset it.

1 Like

Thank you for your reply! Wish you all the best!

I had similar error when I added additional tokens and didnt do:
model.resize_token_embeddings(len(tokenizer))

1 Like

Hi,
I am encountering the index issue too and I am additional tokens as well.
But even with resize_token_embeddings still I am getting the error :confused:
Any Ideas?

Error:

indexSelectLargeIndex: block: [119,0,0], thread: [120,0,0] Assertion srcIndex < srcSelectDimSize failed.

Same here!
Did you manage to resolve this problem?

Check which indexing layer is failing and the min./max. values of the input tensor. Then make sure the range of the index tensor is valid for the layer.

Thanks for the reply!
I checked the range of the inputs (min/max). Actually, all of them are valid when I am feeding the model (tokenizer and DataCollator).

In that case the assert shouldn’t be raised so make sure you are checking the right input tensor.

thank you very much for your solution, which solves my problem perfectly!

1 Like

How did you solved this problem? Just need to know as even am facing same issue while training layoutlm model on downstream task.

"RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
"

Rerun your code with blocking launches as described here and check the values of the indexing tensor as well as all shapes of tensors related to the failing operation.

Tried to run the code now getting this error →

…/aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [39,0,0], thread: [29,0,0] Assertion srcIndex < srcSelectDimSize failed.

1025 labels (torch.LongTensor of shape (batch_size,), optional):
1026 Labels for computing the sequence classification/regression loss. Indices should be in `[0, …,
1067 >>> logits = outputs.logits

return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: CUDA error: device-side assert triggered.

The stacktrace seems to point to the embeddings so check its inputs and make sure the indices are in [0, num_embeddings-1].

hi everyone. compiling the model solved this problem for us. but why would it?
if torch.cuda.is_available():
self.model = torch.compile(self.model).to(self.device)

Compiling the model should not fix real indexing errors and I would be careful with this “solution” as in the worst case the error is not raised anymore.

for me problem was when i used wrong tokenizer for preprocess my data

changed tokenizer solved this issue

A heads-up for people working with the SentenceTransformer lib. This error will also be shown when one tries to encode a sentence with more tokens than defined by max_position_embeddings e.g. 512 tokens. I had a sentence in my corpus with about 561 tokens and then I got this error.

1 Like

If you’ve encountered a problem similar to @david.waterworth when using RoBERTa from the transformers library, ensure that you set the max_length for tokenization to max_position_embeddings - 2. Alternatively, you can directly set tokenizer.model_max_length to max_position_embeddings - 2, thereby eliminating the need to define it explicitly during the tokenization process.

Note on Default Settings:
If you’re using the roberta-base model, you might not even notice this issue, as the default setting already takes care of it.

Monitoring Tensor Shapes within a Seq2Seq Training Loop:
For those who wish to monitor tensor shapes during the training loop using the Seq2Seq trainer, you can implement a custom Seq2Seq trainer. This custom trainer will log the tensors and their shapes for the first two steps and will log only the shapes for up to the first 10,000 steps.

Code Sample
Here’s how you can set up a custom Seq2Seq trainer:

class CustomSeq2SeqTrainer(Seq2SeqTrainer):
    def __init__(self, *args, **kwargs):
        super(CustomSeq2SeqTrainer, self).__init__(*args, **kwargs)
        self.step_count = 0  # Initialize a step counter attribute

    def training_step(self, model, inputs):
        self.step_count += 1  # Increment the step counter

        # Log tensor details for the first two steps
        if self.step_count <= 2:
            for k, v in inputs.items():
                if isinstance(v, torch.Tensor):
                    logger.info(f"Step {self.step_count} -- {k}: Shape={v.shape}")
                    logger.info(f"Step {self.step_count} -- {k}: Tensor={v}")

        # Log tensor shapes for the first 10,000 steps
        elif self.step_count <= 10000:
            for k, v in inputs.items():
                if isinstance(v, torch.Tensor):
                    logger.info(f"Step {self.step_count} -- {k}: Shape={v.shape}")

        return super(CustomSeq2SeqTrainer, self).training_step(model, inputs)  # Call the parent class's method

# ------ Initialize the Custom Trainer ------ #
logger.info("Setting up the trainer...")
trainer = CustomSeq2SeqTrainer(
    # Add your additional arguments and configurations here
)

With this setup, you’ll have a clear view of what is happening with your tensors during the training loop.