hello, excuse me, I’d like to know how did you find it? I face the same problem, but I don’t know why it could be out of index bound. thank you!
If you are seeing this error using an nn.Embedding
layer, you might add a print statement which shows the min and max values for each input. Some batches might have an out of bounds index.
Once you find, the erroneous batch you should have a look how it was created so that you can fix this error.
Excatly, I solved my problem. Thank u very much.
How you guys solve this problem? I am facing the same issue. It’s running well on CPU but it doesn’t work on GPU.
I think the PyTorch error messages should be improved here. Having the wrong number of classes for nn.Embedding throws a bunch of C++ errors and returns CUDNN_STATUS_NOT_INITIALIZED on the latest version. Quite hard to debug this problem given these non-informative error messages.
CUDA errors might be sometimes cryptic, so I generally recommend to debug the code on the CPU, if possible. If that’s not possible, I would try to execute the script via:
CUDA_LAUNCH_BLOCKING=1 python script.py args
to get the right line of code which raised the error in the stack trace.
For others stumbling on this thread, be careful to choose a positive index (i.e. 0 instead of -1) for padding sequences as input to an embedding layer. Even if you specify the negative index in the embedding constructor, you will still get a runtime error on both CPU and GPU
import torch
import torch.nn as nn
emb = nn.Embedding(20, 100, padding_idx=-1)
inp = torch.tensor([5, 2, 7, 12, 3])
bad_padding = torch.cat((inp, torch.tensor([-1] * 3)))
good_padding = torch.cat((inp, torch.tensor([0] * 3)))
out = emb(good_padding)
out = emb(bad_padding) # RuntimeError
Was nice to see an explanation for why trying with CPU was a good start:
In my case I now see:
“RuntimeError: index out of range: Tried to access index 20000 out of table with 19999 rows. at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418”
When running an LSTM as such:
class LSTMClassifier(nn.Module):
"""
USAGE:
model = LSTMClassifier( HIDDEN_SIZE, INPUT_SIZE, VOCAB_SIZE )
model.to( DEVICE )
"""
# initial setup of the RNN, ..
# .. given user parameters, notice we have [at least] 3 layers:
# 1. embedding,
# 2. encoder [x N_LAYERS],
# 3. predictor
def __init__(self, hidden_size, embedding_dim, vocab_size, n_lstm_layers): # bespoke @ANDY:@DEBUG:@1818 this may be causing a bug, below is default
#def __init__(self, hidden_size, embedding_dim, vocab_size): # ^
super(LSTMClassifier, self).__init__()
#self.embedding = nn.Embedding(vocab_size, embedding_dim) # @ANDY:@DEBUG:@1818 this leads to error: "RuntimeError: index out of range: Tried to access index 20000 out of table with 19999 rows. at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418", so a crude fix is the line below
self.embedding = nn.Embedding(vocab_size, embedding_dim) # see ^
self.encoder = nn.LSTM( input_size = embedding_dim,
hidden_size = hidden_size,
# num_layers = 2)
num_layers = n_lstm_layers) # @ANDY:@DEBUG:@1818 this leads to error: "RuntimeError: index out of range: Tried to access index 20000 out of table with 19999 rows. at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418", so a crude fix is the line below
#self.predictor = nn.Linear(hidden_size, N_OUT_CLASSES ) # bespoke @ANDY:@DEBUG:@1818 this may be causing a bug, below is default // arg1 = size of input, arg2 = number of output classes, see: https://pytorch.org/docs/stable/nn.html (CTRL+F: "nn.Linear")
self.predictor = nn.Linear(hidden_size, 2 ) # ^ // arg1 = size of input, arg2 = number of output classes, see: https://pytorch.org/docs/stable/nn.html (CTRL+F: "nn.Linear")
#self.flatten_parameters()
# This is how the model makes predictions,
# .. given an input (training: u/ later to calculate losses & backprops )
def forward( self, seq ):
output, (hidden,_) = self.encoder(self.embedding(seq))
preds = self.predictor(hidden.squeeze(0)) # e.g. remove 1D entries from the shape of an array, see: https://docs.scipy.org/doc/numpy/reference/generated/numpy.squeeze.html
return preds
There are two places this 20000 is used:
Instantiating the model:
lstm_classifier = LSTMClassifier( HIDDEN_SIZE=150, INPUT_SIZE=300, vocab_size=20000, N_LAYERS=4 )
And when making the vocabulary, in the dataset generation phase:
VOCAB_SIZE = 20000
vocab_size = VOCAB_SIZE # to restrict the vocabulary, which saves memory
TWEET.build_vocab(train, max_size = vocab_size)
Any help would greatly be appreciated!
I’ve come to learn what was going wrong…
When you build a vocabulary using torchtext.data.Label
class:
from torchtext import data
print("Building vocabulary...")
TWEET = data.Field( tokenize="spacy", lower=True ) # https://spacy.io/usage/
vocab_size = 20000 # to restrict the vocabulary, which saves memory
TWEET.build_vocab(train, max_size = vocab_size)
Since I told it 20000 is the maximum vocab size, I would have expected the maximum sequence input would be 19999th element.
BUT
when you ask for the length of this vocabulary, it is always +2 from the maximum vocabulary size you ASKED it to restrict the vocab to:
In [12]: len(TWEET.vocab)
Out[12]: 20002
This is because two additional tokens are added, one of then is <unk> for unknown, etc.
So you need to tell your classifier this:
self.embedding = nn.Embedding(vocab_size +2, embedding_dim)
Full code for the LSTM class:
class LSTMClassifier(nn.Module):
"""
USAGE:
model = LSTMClassifier( HIDDEN_SIZE, INPUT_SIZE, VOCAB_SIZE )
model.to( DEVICE )
"""
# initial setup of the RNN, ..
# .. given user parameters, notice we have [at least] 3 layers:
# 1. embedding,
# 2. encoder [x N_LAYERS],
# 3. predictor
def __init__(self, hidden_size, embedding_dim, vocab_size, n_lstm_layers): # bespoke @ANDY:@DEBUG:@1818 this may be causing a bug, below is default
#def __init__(self, hidden_size, embedding_dim, vocab_size): # ^
super(LSTMClassifier, self).__init__()
#self.embedding = nn.Embedding(vocab_size, embedding_dim) # @ANDY:@DEBUG:@1818 this leads to error: "RuntimeError: index out of range: Tried to access index 20000 out of table with 19999 rows. at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418", so a crude fix is the line below
self.embedding = nn.Embedding(vocab_size +2, embedding_dim) # see ^
self.encoder = nn.LSTM( input_size = embedding_dim,
hidden_size = hidden_size,
# num_layers = 2)
num_layers = n_lstm_layers) # @ANDY:@DEBUG:@1818 this leads to error: "RuntimeError: index out of range: Tried to access index 20000 out of table with 19999 rows. at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418", so a crude fix is the line below
#self.predictor = nn.Linear(hidden_size, N_OUT_CLASSES ) # bespoke @ANDY:@DEBUG:@1818 this may be causing a bug, below is default // arg1 = size of input, arg2 = number of output classes, see: https://pytorch.org/docs/stable/nn.html (CTRL+F: "nn.Linear")
self.predictor = nn.Linear(hidden_size, 2 ) # ^ // arg1 = size of input, arg2 = number of output classes, see: https://pytorch.org/docs/stable/nn.html (CTRL+F: "nn.Linear")
#self.flatten_parameters()
# This is how the model makes predictions,
# .. given an input (training: u/ later to calculate losses & backprops )
def forward( self, seq ):
try:
output, (hidden,_) = self.encoder(self.embedding(seq))
preds = self.predictor(hidden.squeeze(0)) # e.g. remove 1D entries from the shape of an array, see: https://docs.scipy.org/doc/numpy/reference/generated/numpy.squeeze.html
except:
[max(seq[i]) for i in range(seq.shape[0])]
pdb.set_trace()
return preds
I run into the same error when using a model from Huggingface transformers (BertModel). The code runs fine on CPU:
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexT
ype>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, Ds
tDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [145,0,0], thread: [95,0,0] Assertion srcIndex < srcS electDimSize
failed.
Traceback (most recent call last):
File “run.py”, line 58, in
cmd(args)
File “/project/piqasso/tools/biaffine-parser/parser/cmds/train.py”, line 82, in call
self.train(train.loader)
File “/project/piqasso/tools/biaffine-parser/parser/cmds/cmd.py”, line 83, in train
arc_scores, rel_scores = self.model(words, feats)
File “/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “/project/piqasso/tools/biaffine-parser/parser/model.py”, line 90, in forward
feat_embed = self.feat_embed(*feats)
File “/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “/project/piqasso/tools/biaffine-parser/parser/modules/bert.py”, line 43, in forward
bert = bert[bert_mask].split(bert_lens[mask].tolist())
RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered
How can I figure out what is the culprit?
I got the same error as @attardi when running on GPU the ‘bert-large-cased’ model from HuggingFace. Is anybody aware of a solution for this problem?
By chance did you find you what was causing the problem?
It seems that there were some input instances there were exceeding the maximum number of workpiece embeddings that BERT could handle. What I did is simply check the dimensions of the input batches and pass as input to the model only those that were not exceeding that limit.
I’ve been having this issue with RoBERTa. The problem was the max_position_embeddings parameter must be larger than the max_seq_length, otherwise the position embedding can generate indices > max_position_embeddings
Could also be due to some issue with the vocab file. As it was in my case
Has anyone found a solution by chance? I get the same error when launching a training from scratch of huggingface models Roberta and BERT (transformers/examples/language-modeling at master · huggingface/transformers · GitHub). I received many and many of this errors
/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [372,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Then the stack trace:
Traceback (most recent call last):
File "/data/medioli/transformers/examples/language-modeling/run_mlm.py", line 491, in <module>
main()
File "/data/medioli/transformers/examples/language-modeling/run_mlm.py", line 457, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/data/medioli/env/lib/python3.6/site-packages/transformers/trainer.py", line 1053, in train
tr_loss += self.training_step(model, inputs)
File "/data/medioli/env/lib/python3.6/site-packages/transformers/trainer.py", line 1443, in training_step
loss = self.compute_loss(model, inputs)
File "/data/medioli/env/lib/python3.6/site-packages/transformers/trainer.py", line 1475, in compute_loss
outputs = model(**inputs)
File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/parallel/distributed.py", line 511, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/medioli/env/lib/python3.6/site-packages/transformers/models/roberta/modeling_roberta.py", line 1057, in forward
return_dict=return_dict,
File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/medioli/env/lib/python3.6/site-packages/transformers/models/roberta/modeling_roberta.py", line 810, in forward
past_key_values_length=past_key_values_length,
File "/data/medioli/env/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/medioli/env/lib/python3.6/site-packages/transformers/models/roberta/modeling_roberta.py", line 123, in forward
embeddings += position_embeddings
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fa4517ed1e2 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xad2 (0x7fa451a3bf92 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fa4517db9cd in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libc10.so)
frame #3: std::vector<c10d::Reducer::Bucket, std::allocator<c10d::Reducer::Bucket> >::~vector() + 0x25a (0x7fa427f8489a in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: c10d::Reducer::~Reducer() + 0x28a (0x7fa427f79b1a in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #5: std::_Sp_counted_ptr<c10d::Reducer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x12 (0x7fa427f593c2 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #6: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x46 (0x7fa4277577a6 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0xa6b08b (0x7fa427f5a08b in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0x273c00 (0x7fa427762c00 in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x274e4e (0x7fa427763e4e in /data/medioli/env/lib64/python3.6/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #22: main + 0x16e (0x400a3e in /data/medioli/env/bin/python3)
frame #23: __libc_start_main + 0xf5 (0x7fa48f4903d5 in /lib64/libc.so.6)
frame #24: /data/medioli/env/bin/python3() [0x400b02]
Hi smth, I found another bug related to this.
If I define two tensors in jupyter notebook, like
a = torch.randn(2,3)
b=torch.tensor([2,3])
where b is out of the index of a.
If I input and run a[b]
in a new cell of this notebook , the error in such topic will appear.
However, when I define a new tensor c like this:
c = torch.tensor([3,3])
c = c.cuda()
the same error will appear again like RuntimeError: CUDA error: device-side assert triggered
Could you tell me how to deal with that?
Thank you!
Your index tensor contains out-of-bounds values as PyTorch tensors use a 0-based index. Once you are hitting a sticky CUDA error, the CUDA context will be corrupted and you would need to reset it.
Thank you for your reply! Wish you all the best!