Cuda runtime error (710) : device-side assert triggered at

Elidor · September 25, 2020, 10:38am

Hello everybody,

I’m training a fairly complex model on Colab. After adding a positional embedding to the various embedding (word, character, tag, etc.), I received the following error:

/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [26,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

This line is repeated several times and in the end I get:

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=313 error=710 : device-side assert triggered
Traceback (most recent call last):
  File "train.py", line 236, in <module>
    loss = pat.train_conll(batch)
  File ".../originalModel.py", line 249, in train_conll
    y_pred1, y_pred2 = self.forward(sentences)
  File ".../originalModel.py", line 472, in forward
    we = torch.cat((position, we), 2)
RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THC/THCGeneral.cpp:313

I got this output by running the python script with CUDA_LAUNCH_BLOCKING=1.

The piece of code that gives me this error is:

    def forward(self, sentences):
        orig_w = [[e.form for e in sentence] for sentence in sentences] 
        w, t, x_lengths = self.sentence2tok_tags(sentences)

        batch_size, seq_len = w.size()
        # (batch_size, seq_len) -> (batch_size, seq_len, embedding_dim)
        we = self.word_embedding(w)
        t = self.tag_embedding(t)

        if self.position_emb:
            # get positional embeddings
            print(w.min(), w.max())  # 0, 378
            position = self.positional_embedding(w)
            # concat positional embeddings with word embeddings

            we = torch.cat((position, we), 2)  # HERE RAISE INDEX ERROR

        # concat tags embeddings and word embeddings
        x = torch.cat((we, t), 2)

After looking for some solutions on google and on this forum, I realized that this error is due to the fact that the max value of w contained indexes outside the range of my embedding. The maximum value of w is 380, while that of my embedding is 150:

class PositionalEmbeddings(nn.Module):
    def __init__(self, emb_size, max_position, pad_index):
        super().__init__()
        
        self.emb_size = emb_size                # 20
        self.max_position = max_position        # 150 
        self.pad_index = pad_index
        
        self.embeddings = nn.Embedding(
            num_embeddings=self.max_position,   # 150
            embedding_dim=self.emb_size,        # 20
            padding_idx=0,
        )

    def forward(self, batch):
        # get positions ignoring pads
        positions = self.get_positions(batch, self.pad_index)
        # get embeddings
        embeddings = self.embeddings(positions)
        return embeddings

    def get_positions(self, batch, pad_index):
        batch_size, sentence_max_length = batch.shape  # 64 (batch size) x 73 (max length)
        # get positions
        positions = torch.arange(1, sentence_max_length+1).expand(batch_size, -1).long().to(self.device)
        # get mask from tensor
        mask = positions*0 + pad_index
        # fill mask
        mask = ~mask.ne(batch)  # 1-mask.ne(batch)
        # mask pad words
        positions[mask] = 0
        return positions

By increasing the value of my embedding to 380, the code works.

Now my question is the following:
is there any way I can keep the value of my embedding as it was in the original (150) and have the embedding only be used where w has a value between 0 and 150? If so and if it makes sense, do you have any ideas on how this can be done? Otherwise if I have to keep 380, to what size should I set the positional embedding (20 seems too little if I do the embedding up to 380)?

Thank you all!

ptrblck · September 26, 2020, 2:20am

This would depend on your use case. How did you create the embedding inputs, i.e. are the values depending on the data or did you create them in another way?

Elidor · September 27, 2020, 2:14pm

The values strongly depend on the data, in fact the function sentence2tok_tags(sentences) returns w that is a matrix word indexes padded, t that is a matrix of tag indexes padded and x_lengths that is an array with sentence lengths.

def sentence2tok_tags(self, sentences):
        w = [[e.norm for e in sentence] for sentence in sentences]
        w, x_lengths = self.prepare(w, self.word_vocab)
        t = [[e.get_partofspeech_tag(self.partofspeech_type) for e in sentence] for sentence in sentences]
        t, _ = self.prepare(t, self.tag_vocab)
        return w, t, x_lengths

where

def prepare(self, sentences, vocab):
        x = [torch.tensor([vocab[w] for w in sentence]).to(self.device) for sentence in sentences]
        x_lengths = np.array([len(sentence) for sentence in x])
        padded_x = torch.nn.utils.rnn.pad_sequence(x, batch_first=True)
        return padded_x, x_lengths

ptrblck · September 27, 2020, 9:48pm

If the indices are defined by the vocabulary, I don’t think it’s a good idea to cut them somehow as this part of the vocabulary wouldn’t be used during training.
Instead of changing the input I would recommend to adapt the num_embeddings in the emnedding layer to match the number of words (indices).

Elidor · September 28, 2020, 2:55pm

ok, I’ll try to do this. Thank you!