After tuning the pre-trained word-embedding got Cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorCopy.cpp:21

Shanto_Islam · April 18, 2020, 4:40pm

I know this error is common but I am unable to solve this specific problem. I’ve checked this topic and have both the concern checked-marked which were about label_size and vocab_size. It would be glad if anyone could point out the problem. Stuck for 2 days already. It started to show up when I tuned a pre-trained fastText. Please let me know if you need anymore information.

Below is the error code

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-68-10826427ec3f> in <module>
     17         # print("Shape : " , _.shape)
     18 
---> 19         predictions, h = model(inp.permute(1,0).to(device), lens, device  ) # TODO:don't need _   #Changed Permute
     20 
     21         # print(targ)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

<ipython-input-64-bc1668a9efcd> in forward(self, x, lens, device)
     32     def forward(self, x, lens, device ):
     33         x = self.embedding(x)
---> 34         self.hidden = self.initialize_hidden_state(device)
     35         h = self.initialize_hidden_state(device)
     36 

<ipython-input-64-bc1668a9efcd> in initialize_hidden_state(self, device)
     22     def initialize_hidden_state(self, device):
     23 #         weight = next(self.parameters()).data
---> 24         return torch.zeros(((self.n_layers, self.batch_sz, self.hidden_units))).to(device)
     25 #         if (device == "cuda:0"):
     26 #             hidden = (weight.new(self.n_layers, batch_sz, self.hidden_units).zero_().cuda(),

RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorCopy.cpp:21

Here is the fine-tuning FastText word-embedding part

!pip install bnlp_toolkit
from bnlp.bengali_fasttext import Bengali_Fasttext
bft = Bengali_Fasttext()


model_name = "/content/drive/My Drive/Research_Shanto/pretrained/saved_model_39.bin"
data = "/content/drive/My Drive/Research_Shanto/Datasets/Ashik Bhai_Sentiment/corpus_39.txt"
epoch = 50
bft.train_fasttext(data, model_name, epoch)

fastText_wv = fText.load_fasttext_format("../input/pretrained/saved_model_39.bin")
weights = torch.FloatTensor(fastText_wv.wv.vectors)
print(weights.shape)

Here is the part where I merge it with the data

# This class creates a word -> index mapping (e.g,. "dad" -> 5) and vice-versa 
# (e.g., 5 -> "dad") for the dataset
class ConstructVocab():
    def __init__(self, sentences):
        self.sentences = sentences
        self.word2idx = {}
        self.idx2word = {}
        self.vocab = set()
        self.create_index()
        
    def create_index(self):
            # update with individual tokens
            
        self.vocab.update(self.sentences)
            
        # sort the vocab
        self.vocab = sorted(self.vocab)
        print(self.vocab)
        

        # add a padding token with index 0
        self.word2idx['<pad>'] = 0
        
        # word to index mapping
        for index, word in enumerate(self.vocab):
            self.word2idx[word] = index + 1 # +1 because of pad token
        
        # index to word mapping
        for word, index in self.word2idx.items():
            self.idx2word[index] = word  

inputs = ConstructVocab(fastText_wv.wv.vocab.keys())

With vocab_inp_size = len(inputs.word2idx), below is the architecture

class EmoLSTM(nn.Module):
    def __init__(self, embedding_matrix, vocab_size, embedding_dim, hidden_units, batch_sz, n_layers, seqLength, device, output_size):
        super(EmoLSTM, self).__init__()
        self.batch_sz = batch_sz
        self.hidden_units = hidden_units
        self.embedding_dim = embedding_dim
        self.vocab_size = vocab_size
        self.output_size = output_size
        self.n_layers = n_layers
        self.seqLength = seqLength
        
        # layers
        
        self.embedding = nn.Embedding.from_pretrained(torch.FloatTensor(embedding_matrix))
        self.dropout = nn.Dropout(p=0.5)
        self.lstm = nn.LSTM(self.embedding_dim, self.hidden_units, self.n_layers, bidirectional = False)
        
        self.fc = nn.Linear(self.hidden_units, self.output_size)
    
    def initialize_hidden_state(self, device):
        return torch.zeros(((self.n_layers, self.batch_sz, self.hidden_units))).to(device)
    
    def forward(self, x, lens, device ):
        x = self.embedding(x)
        self.hidden = self.initialize_hidden_state(device)
        h = self.initialize_hidden_state(device)
  
        output, _ = self.lstm(x, (self.hidden,h) )
        out = output[-1, : , :]  
        out = self.fc(out)
        return out, _

And now the training and validation part

EPOCHS =10

for epoch in range(EPOCHS):
    start = time.time()
    
    ### Initialize hidden state
    # TODO: do initialization here.
    total_loss = 0
    train_accuracy, val_accuracy = 0, 0
    ### Training
    for (batch, (inp, targ, lens)) in enumerate(train_dataset):
        loss = 0
        
        predictions, h = model(inp.permute(1,0).to(device), lens, device  ) # TODO:don't need _   #Changed Permute
           
        loss += loss_function(targ.to(device), predictions)
        batch_loss = (loss / int(targ.shape[1]))        
        total_loss += batch_loss
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        batch_accuracy = accuracy(targ.to(device), predictions)
        train_accuracy += batch_accuracy
        
        if batch % 100 == 0:
            print('Epoch {} Batch {} Val. Loss {:.4f}'.format(epoch + 1,
                                                         batch,
                                                         batch_loss.cpu().detach().numpy()))
            
    ### Validating
    for (batch, (inp, targ, lens)) in enumerate(val_dataset):       
        predictions,val_ = model(inp.permute(1,0).to(device), lens, device)      #Changed Permute  
        batch_accuracy = accuracy(targ.to(device), predictions)
        val_accuracy += batch_accuracy
    
    print('Epoch {} Loss {:.4f} -- Train Acc. {:.4f} -- Val Acc. {:.4f}'.format(epoch + 1, 
                                                             total_loss / TRAIN_N_BATCH, 
                                                             train_accuracy / TRAIN_N_BATCH,
                                                             val_accuracy / VAL_N_BATCH))
    print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

Thank you in advance.

ptrblck · April 19, 2020, 4:19am

Is your code running fine on the CPU? A CPU run would give you most likely a better stack trace, but I assume your indexing might be wrong.

If your CPU code works, could you run the script with CUDA_LAUNCH_BLOCKING=1 python script.py args and post the stack trace here, please?

Shanto_Islam · April 19, 2020, 2:17pm

The code is running well and fine in CPU. Even though kaggle could not allocate enough memory as my dataset was huge. So I shrinked to a smaller size and it ran smooth even with the command you mentioned. But when I run in GPU with this command with the whole datasets, the exact same error showed up. I am posting the error again.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-bf1cd602e30e> in <module>
    540         # print("Shape : " , _.shape)
    541 
--> 542         predictions, h = model(inp.permute(1,0).to(device), lens, device  ) # TODO:don't need _   #Changed Permute
    543 
    544         # print(targ)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

<ipython-input-3-bf1cd602e30e> in forward(self, x, lens, device)
    449     def forward(self, x, lens, device ):
    450         x = self.embedding(x)
--> 451         self.hidden = self.initialize_hidden_state(device)
    452         h = self.initialize_hidden_state(device)
    453 

<ipython-input-3-bf1cd602e30e> in initialize_hidden_state(self, device)
    439     def initialize_hidden_state(self, device):
    440 #         weight = next(self.parameters()).data
--> 441         return torch.zeros(((self.n_layers, self.batch_sz, self.hidden_units))).to(device)
    442 #         if (device == "cuda:0"):
    443 #             hidden = (weight.new(self.n_layers, batch_sz, self.hidden_units).zero_().cuda(),

RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorCopy.cpp:21

ptrblck · April 19, 2020, 10:59pm

Thanks for the update!
Could you post a shape for a random input tensor, which would reproduce this error using your current model, so that we could debug it, please?

Shanto_Islam · April 20, 2020, 12:56am

My initial shape of the input tensor is [seq_length * batch_size] which is [414, 128] and when the error occurs, my input shape is same [414, 128].

Moreover, when I use torch==0.4.1, the error shows up after one complete epoch. But when I use torch==1.4.0, it shows up before one complete epoch. I don’t know if it is any matter of concern as different version can handle data differently but felt like important to point out.

ptrblck · April 20, 2020, 2:05am

Thanks for the information.
I cannot reproduce the error with this minimal code snippet:

import torch
import torch.nn as nn

class EmoLSTM(nn.Module):
    def __init__(self, embedding_matrix, vocab_size, embedding_dim, hidden_units, batch_sz, n_layers, seqLength, device, output_size):
        super(EmoLSTM, self).__init__()
        self.batch_sz = batch_sz
        self.hidden_units = hidden_units
        self.embedding_dim = embedding_dim
        self.vocab_size = vocab_size
        self.output_size = output_size
        self.n_layers = n_layers
        self.seqLength = seqLength

        # layers

        self.embedding = nn.Embedding.from_pretrained(torch.FloatTensor(embedding_matrix))
        self.dropout = nn.Dropout(p=0.5)
        self.lstm = nn.LSTM(self.embedding_dim, self.hidden_units, self.n_layers, bidirectional = False)

        self.fc = nn.Linear(self.hidden_units, self.output_size)

    def initialize_hidden_state(self, device):
        return torch.zeros(((self.n_layers, self.batch_sz, self.hidden_units))).to(device)

    def forward(self, x, device):
        x = self.embedding(x)
        self.hidden = self.initialize_hidden_state(device)
        h = self.initialize_hidden_state(device)

        output, _ = self.lstm(x, (self.hidden,h) )
        out = output[-1, : , :]
        out = self.fc(out)
        return out, _


num_embeddings = 100
embedding_dim = 300
seq_len = 414
batch_size = 128
device = 'cuda'

model = EmoLSTM(
    embedding_matrix=torch.randn(num_embeddings, embedding_dim),
    vocab_size=num_embeddings,
    embedding_dim=embedding_dim,
    hidden_units=2,
    batch_sz=batch_size,
    n_layers=1,
    seqLength=seq_len,
    device=device,
    output_size=2).to(device)


x = torch.randint(0, num_embeddings, (seq_len, batch_size))
predictions, h = model(x.to(device), device)

Does this code yield the same error on your setup?
If not, could you check what the difference might be between the code snippets?

Shanto_Islam · April 20, 2020, 2:12am

Okay, I just increased my “smaller data-set” size and run on cpu and got the following error

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-56-7ab077e1fecf> in <module>
     20 # h= tuple([each.data for each in h])
     21 
---> 22 output, _ = model(xs.to(device), lens, device,inputs.idx2word )
     23 print(output.size())
     24 print(model)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

<ipython-input-54-8bffe47c2362> in forward(self, x, lens, device, idx2word)
     36 
     37             print("----------------")
---> 38         x = self.embedding(x)
     39         self.hidden = self.initialize_hidden_state(device)
     40         h = self.initialize_hidden_state(device)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/sparse.py in forward(self, input)
    112         return F.embedding(
    113             input, self.weight, self.padding_idx, self.max_norm,
--> 114             self.norm_type, self.scale_grad_by_freq, self.sparse)
    115 
    116     def extra_repr(self):

/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1482         # remove once script supports set_grad_enabled
   1483         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1484     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1485 
   1486 

RuntimeError: index out of range: Tried to access index 9164 out of table with 9163 rows. at /opt/conda/conda-bld/pytorch_1579022119164/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418

Even though here it says tried to access index out of table, but I was able to output the whole input tensor into sentences before x = self.embedding(x) (line 38)

ptrblck · April 20, 2020, 2:22am

How did you increase the input tensor?
Could you check its min and max values and compare it to the shape of the embedding matrix?

Shanto_Islam · April 20, 2020, 3:46am

Okay, I have fixed the problem, As I kept ‘0’ for <pad>, my increments reached the last index(shape of embedding matrix). Thus when a dataset contained the word of the embedding matrix, it threw this error.

Fix: Therefore, I initialized the last word in the matrix as ‘0’, now the code runs good for all datasets.

Thank you very very much for your time and showing me the right places to look for bug. As pytorch is a framework, error like this isn’t really that much transparent and demotivates coders to look for it. Kudos.