Hello everybody,
I’m training a fairly complex model on Colab. After adding a positional embedding to the various embedding (word, character, tag, etc.), I received the following error:
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [26,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
This line is repeated several times and in the end I get:
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=313 error=710 : device-side assert triggered
Traceback (most recent call last):
File "train.py", line 236, in <module>
loss = pat.train_conll(batch)
File ".../originalModel.py", line 249, in train_conll
y_pred1, y_pred2 = self.forward(sentences)
File ".../originalModel.py", line 472, in forward
we = torch.cat((position, we), 2)
RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THC/THCGeneral.cpp:313
I got this output by running the python script with CUDA_LAUNCH_BLOCKING=1
.
The piece of code that gives me this error is:
def forward(self, sentences):
orig_w = [[e.form for e in sentence] for sentence in sentences]
w, t, x_lengths = self.sentence2tok_tags(sentences)
batch_size, seq_len = w.size()
# (batch_size, seq_len) -> (batch_size, seq_len, embedding_dim)
we = self.word_embedding(w)
t = self.tag_embedding(t)
if self.position_emb:
# get positional embeddings
print(w.min(), w.max()) # 0, 378
position = self.positional_embedding(w)
# concat positional embeddings with word embeddings
we = torch.cat((position, we), 2) # HERE RAISE INDEX ERROR
# concat tags embeddings and word embeddings
x = torch.cat((we, t), 2)
After looking for some solutions on google and on this forum, I realized that this error is due to the fact that the max value of w
contained indexes outside the range of my embedding. The maximum value of w
is 380, while that of my embedding is 150:
class PositionalEmbeddings(nn.Module):
def __init__(self, emb_size, max_position, pad_index):
super().__init__()
self.emb_size = emb_size # 20
self.max_position = max_position # 150
self.pad_index = pad_index
self.embeddings = nn.Embedding(
num_embeddings=self.max_position, # 150
embedding_dim=self.emb_size, # 20
padding_idx=0,
)
def forward(self, batch):
# get positions ignoring pads
positions = self.get_positions(batch, self.pad_index)
# get embeddings
embeddings = self.embeddings(positions)
return embeddings
def get_positions(self, batch, pad_index):
batch_size, sentence_max_length = batch.shape # 64 (batch size) x 73 (max length)
# get positions
positions = torch.arange(1, sentence_max_length+1).expand(batch_size, -1).long().to(self.device)
# get mask from tensor
mask = positions*0 + pad_index
# fill mask
mask = ~mask.ne(batch) # 1-mask.ne(batch)
# mask pad words
positions[mask] = 0
return positions
By increasing the value of my embedding to 380, the code works.
Now my question is the following:
is there any way I can keep the value of my embedding as it was in the original (150) and have the embedding only be used where w
has a value between 0 and 150? If so and if it makes sense, do you have any ideas on how this can be done? Otherwise if I have to keep 380, to what size should I set the positional embedding (20 seems too little if I do the embedding up to 380)?
Thank you all!