[solved] Assertion `srcIndex < srcSelectDimSize` failed on GPU for `torch.cat()`

hello, excuse me, I’d like to know how did you find it? I face the same problem, but I don’t know why it could be out of index bound. thank you!

If you are seeing this error using an nn.Embedding layer, you might add a print statement which shows the min and max values for each input. Some batches might have an out of bounds index.
Once you find, the erroneous batch you should have a look how it was created so that you can fix this error.

9 Likes

Excatly, I solved my problem. Thank u very much.

How you guys solve this problem? I am facing the same issue. It’s running well on CPU but it doesn’t work on GPU.

I think the PyTorch error messages should be improved here. Having the wrong number of classes for nn.Embedding throws a bunch of C++ errors and returns CUDNN_STATUS_NOT_INITIALIZED on the latest version. Quite hard to debug this problem given these non-informative error messages.

2 Likes

CUDA errors might be sometimes cryptic, so I generally recommend to debug the code on the CPU, if possible. If that’s not possible, I would try to execute the script via:

CUDA_LAUNCH_BLOCKING=1 python script.py args

to get the right line of code which raised the error in the stack trace.

2 Likes

For others stumbling on this thread, be careful to choose a positive index (i.e. 0 instead of -1) for padding sequences as input to an embedding layer. Even if you specify the negative index in the embedding constructor, you will still get a runtime error on both CPU and GPU

import torch
import torch.nn as nn

emb = nn.Embedding(20, 100, padding_idx=-1)
inp = torch.tensor([5, 2, 7, 12, 3])
bad_padding = torch.cat((inp, torch.tensor([-1] * 3)))
good_padding = torch.cat((inp, torch.tensor([0] * 3)))
out = emb(good_padding)
out = emb(bad_padding)  # RuntimeError
1 Like

Was nice to see an explanation for why trying with CPU was a good start:

In my case I now see:

“RuntimeError: index out of range: Tried to access index 20000 out of table with 19999 rows. at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418”

When running an LSTM as such:

class LSTMClassifier(nn.Module):

	"""
	USAGE:
		model = LSTMClassifier( HIDDEN_SIZE, INPUT_SIZE, VOCAB_SIZE )
		model.to( DEVICE )
	"""

	# initial setup of the RNN, ..
	# .. given user parameters, notice we have [at least] 3 layers: 
	# 		1. embedding, 
	#  		2. encoder [x N_LAYERS], 
	# 		3. predictor

	def __init__(self, hidden_size, embedding_dim, vocab_size, n_lstm_layers):  # bespoke @ANDY:@DEBUG:@1818 this may be causing a bug, below is default
	#def __init__(self, hidden_size, embedding_dim, vocab_size):  # ^
		super(LSTMClassifier, self).__init__()

		#self.embedding 	= nn.Embedding(vocab_size, embedding_dim) # @ANDY:@DEBUG:@1818 this leads to error: "RuntimeError: index out of range: Tried to access index 20000 out of table with 19999 rows. at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418", so a crude fix is the line below
		self.embedding 	= nn.Embedding(vocab_size, embedding_dim) # see ^
		self.encoder 	= nn.LSTM( 	input_size  = embedding_dim, 
									hidden_size = hidden_size,
								#	num_layers = 2) 
									num_layers  = n_lstm_layers)     # @ANDY:@DEBUG:@1818 this leads to error: "RuntimeError: index out of range: Tried to access index 20000 out of table with 19999 rows. at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418", so a crude fix is the line below
		#self.predictor 	= nn.Linear(hidden_size, N_OUT_CLASSES )  # bespoke @ANDY:@DEBUG:@1818 this may be causing a bug, below is default // arg1 = size of input, arg2 = number of output classes, see: https://pytorch.org/docs/stable/nn.html (CTRL+F: "nn.Linear")
		self.predictor 	= nn.Linear(hidden_size, 2 )  # ^ // arg1 = size of input, arg2 = number of output classes, see: https://pytorch.org/docs/stable/nn.html (CTRL+F: "nn.Linear")

		#self.flatten_parameters()


	# This is how the model makes predictions, 
	# .. given an input (training: u/ later to calculate losses & backprops )
	def forward( self, seq ):

		output, (hidden,_) 	= self.encoder(self.embedding(seq))
		preds 				= self.predictor(hidden.squeeze(0))  # e.g. remove 1D entries from the shape of an array, see: https://docs.scipy.org/doc/numpy/reference/generated/numpy.squeeze.html

		return preds

There are two places this 20000 is used:

Instantiating the model:

lstm_classifier = LSTMClassifier( HIDDEN_SIZE=150, INPUT_SIZE=300, vocab_size=20000, N_LAYERS=4 )

And when making the vocabulary, in the dataset generation phase:

VOCAB_SIZE = 20000
vocab_size = VOCAB_SIZE  # to restrict the vocabulary, which saves memory
TWEET.build_vocab(train, max_size = vocab_size)

Any help would greatly be appreciated!

:pray:

I’ve come to learn what was going wrong…

When you build a vocabulary using torchtext.data.Label class:

from torchtext import data
print("Building vocabulary...")
TWEET = data.Field( tokenize="spacy", lower=True ) # https://spacy.io/usage/
vocab_size = 20000  # to restrict the vocabulary, which saves memory
TWEET.build_vocab(train, max_size = vocab_size)

Since I told it 20000 is the maximum vocab size, I would have expected the maximum sequence input would be 19999th element.

BUT

when you ask for the length of this vocabulary, it is always +2 from the maximum vocabulary size you ASKED it to restrict the vocab to:

In [12]: len(TWEET.vocab)                                                 
Out[12]: 20002

This is because two additional tokens are added, one of then is <unk> for unknown, etc.

So you need to tell your classifier this:

self.embedding = nn.Embedding(vocab_size +2, embedding_dim)

Full code for the LSTM class:

class LSTMClassifier(nn.Module):

	"""
	USAGE:
		model = LSTMClassifier( HIDDEN_SIZE, INPUT_SIZE, VOCAB_SIZE )
		model.to( DEVICE )
	"""

	# initial setup of the RNN, ..
	# .. given user parameters, notice we have [at least] 3 layers: 
	# 		1. embedding, 
	#  		2. encoder [x N_LAYERS], 
	# 		3. predictor

	def __init__(self, hidden_size, embedding_dim, vocab_size, n_lstm_layers):  # bespoke @ANDY:@DEBUG:@1818 this may be causing a bug, below is default
	#def __init__(self, hidden_size, embedding_dim, vocab_size):  # ^
		super(LSTMClassifier, self).__init__()

		#self.embedding 	= nn.Embedding(vocab_size, embedding_dim) # @ANDY:@DEBUG:@1818 this leads to error: "RuntimeError: index out of range: Tried to access index 20000 out of table with 19999 rows. at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418", so a crude fix is the line below
		self.embedding 	= nn.Embedding(vocab_size +2, embedding_dim) # see ^
		self.encoder 	= nn.LSTM( 	input_size  = embedding_dim, 
									hidden_size = hidden_size,
								#	num_layers = 2) 
									num_layers  = n_lstm_layers)     # @ANDY:@DEBUG:@1818 this leads to error: "RuntimeError: index out of range: Tried to access index 20000 out of table with 19999 rows. at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418", so a crude fix is the line below
		#self.predictor 	= nn.Linear(hidden_size, N_OUT_CLASSES )  # bespoke @ANDY:@DEBUG:@1818 this may be causing a bug, below is default // arg1 = size of input, arg2 = number of output classes, see: https://pytorch.org/docs/stable/nn.html (CTRL+F: "nn.Linear")
		self.predictor 	= nn.Linear(hidden_size, 2 )  # ^ // arg1 = size of input, arg2 = number of output classes, see: https://pytorch.org/docs/stable/nn.html (CTRL+F: "nn.Linear")

		#self.flatten_parameters()


	# This is how the model makes predictions, 
	# .. given an input (training: u/ later to calculate losses & backprops )
	def forward( self, seq ):

		try:
			output, (hidden,_) 	= self.encoder(self.embedding(seq))
			preds 				= self.predictor(hidden.squeeze(0))  # e.g. remove 1D entries from the shape of an array, see: https://docs.scipy.org/doc/numpy/reference/generated/numpy.squeeze.html

		except:
			[max(seq[i]) for i in range(seq.shape[0])]
			pdb.set_trace()

		return preds
2 Likes

I run into the same error when using a model from Huggingface transformers (BertModel). The code runs fine on CPU:

/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexT
ype>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, Ds
tDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [145,0,0], thread: [95,0,0] Assertion srcIndex < srcS electDimSize failed.
Traceback (most recent call last):
File “run.py”, line 58, in
cmd(args)
File “/project/piqasso/tools/biaffine-parser/parser/cmds/train.py”, line 82, in call
self.train(train.loader)
File “/project/piqasso/tools/biaffine-parser/parser/cmds/cmd.py”, line 83, in train
arc_scores, rel_scores = self.model(words, feats)
File “/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “/project/piqasso/tools/biaffine-parser/parser/model.py”, line 90, in forward
feat_embed = self.feat_embed(*feats)
File “/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “/project/piqasso/tools/biaffine-parser/parser/modules/bert.py”, line 43, in forward
bert = bert[bert_mask].split(bert_lens[mask].tolist())
RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered

How can I figure out what is the culprit?

1 Like

I got the same error as @attardi when running on GPU the ‘bert-large-cased’ model from HuggingFace. Is anybody aware of a solution for this problem?

By chance did you find you what was causing the problem?

Possibly related to https://github.com/pytorch/pytorch/issues/46020

It seems that there were some input instances there were exceeding the maximum number of workpiece embeddings that BERT could handle. What I did is simply check the dimensions of the input batches and pass as input to the model only those that were not exceeding that limit.

1 Like

I’ve been having this issue with RoBERTa. The problem was the max_position_embeddings parameter must be larger than the max_seq_length, otherwise the position embedding can generate indices > max_position_embeddings

Could also be due to some issue with the vocab file. As it was in my case

Has anyone found a solution by chance? I get the same error when launching a training from scratch of huggingface models Roberta and BERT (transformers/examples/language-modeling at master · huggingface/transformers · GitHub). I received many and many of this errors

/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [372,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

Then the stack trace:

Traceback (most recent call last):
  File "/data/medioli/transformers/examples/language-modeling/run_mlm.py", line 491, in <module>
    main()
  File "/data/medioli/transformers/examples/language-modeling/run_mlm.py", line 457, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/data/medioli/env/lib/python3.6/site-packages/transformers/trainer.py", line 1053, in train
    tr_loss += self.training_step(model, inputs)
  File "/data/medioli/env/lib/python3.6/site-packages/transformers/trainer.py", line 1443, in training_step
    loss = self.compute_loss(model, inputs)
  File "/data/medioli/env/lib/python3.6/site-packages/transformers/trainer.py", line 1475, in compute_loss
    outputs = model(**inputs)
  File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/parallel/distributed.py", line 511, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/medioli/env/lib/python3.6/site-packages/transformers/models/roberta/modeling_roberta.py", line 1057, in forward
    return_dict=return_dict,
  File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/medioli/env/lib/python3.6/site-packages/transformers/models/roberta/modeling_roberta.py", line 810, in forward
    past_key_values_length=past_key_values_length,
  File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/medioli/env/lib/python3.6/site-packages/transformers/models/roberta/modeling_roberta.py", line 123, in forward
    embeddings += position_embeddings
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: device-side assert triggered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fa4517ed1e2 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7fa451a3bf92 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fa4517db9cd in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libc10.so)
frame #3: std::vector<c10d::Reducer::Bucket, std::allocator<c10d::Reducer::Bucket> >::~vector() + 0x25a (0x7fa427f8489a in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: c10d::Reducer::~Reducer() + 0x28a (0x7fa427f79b1a in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #5: std::_Sp_counted_ptr<c10d::Reducer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x12 (0x7fa427f593c2 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #6: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x46 (0x7fa4277577a6 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0xa6b08b (0x7fa427f5a08b in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0x273c00 (0x7fa427762c00 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x274e4e (0x7fa427763e4e in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #22: main + 0x16e (0x400a3e in /data/medioli/env/bin/python3)
frame #23: __libc_start_main + 0xf5 (0x7fa48f4903d5 in /lib64/libc.so.6)
frame #24: /data/medioli/env/bin/python3() [0x400b02]

Hi smth, I found another bug related to this.
If I define two tensors in jupyter notebook, like

a = torch.randn(2,3)
b=torch.tensor([2,3])

where b is out of the index of a.
If I input and run a[b] in a new cell of this notebook , the error in such topic will appear.
However, when I define a new tensor c like this:

c = torch.tensor([3,3])
c = c.cuda()

the same error will appear again like RuntimeError: CUDA error: device-side assert triggered
Could you tell me how to deal with that?
Thank you!

Your index tensor contains out-of-bounds values as PyTorch tensors use a 0-based index. Once you are hitting a sticky CUDA error, the CUDA context will be corrupted and you would need to reset it.

1 Like

Thank you for your reply! Wish you all the best!