[solved] Assertion `srcIndex < srcSelectDimSize` failed on GPU for `torch.cat()`

darkraisisi · January 15, 2024, 11:39am

This comment was the answer in my case when using pretrained language models.
When using provided tokenizers be sure to use the correct embedding size that fits with the model you are using.
It is less obvious as the error happens ‘out of nowhere’, when the one document that has the longer embedding shows up. And also because the error is not the result of some error in the prediction/ training loop.

jjamnicki · January 25, 2024, 6:06pm

In my case it was mismatched tokenizer.pad_token_id, your model and tokenizer should use the same one

tj112 · February 20, 2024, 6:32pm

Increase max_position_embeddings from its default value of 512

config = RobertaConfig(
max_position_embeddings = 1024,
)

model = RobertaForMaskedLM(config=config)

pratyaksh_agarwal · April 23, 2024, 9:37am

i think this is a problem because of vocab size the same problem was coming in my code snippent. The reason was that i updated the vocab with 2 extra token (‘sos’ and ‘eos’) but i forgot to increase the size of vocab