[solved] Assertion `srcIndex < srcSelectDimSize` failed on GPU for `torch.cat()`

This comment was the answer in my case when using pretrained language models.
When using provided tokenizers be sure to use the correct embedding size that fits with the model you are using.
It is less obvious as the error happens ‘out of nowhere’, when the one document that has the longer embedding shows up. And also because the error is not the result of some error in the prediction/ training loop.

In my case it was mismatched tokenizer.pad_token_id, your model and tokenizer should use the same one :smiley:

Increase max_position_embeddings from its default value of 512

config = RobertaConfig(
max_position_embeddings = 1024,
)

model = RobertaForMaskedLM(config=config)

i think this is a problem because of vocab size the same problem was coming in my code snippent. The reason was that i updated the vocab with 2 extra token (‘sos’ and ‘eos’) but i forgot to increase the size of vocab