I’m training a model; after running for 50k to 0.5 million epochs, it gives me this error, again and again! I’ve attached the terminal error and also the piece of code where the error occurs. Kindly help me!

The code is attached below:

def train_batch_MLE(self, enc_out, enc_hidden, enc_padding_mask, ct_e, extra_zeros, enc_batch_extend_vocab, batch):

‘’’ Calculate Negative Log Likelihood Loss for the given batch. In order to reduce exposure bias,

pass the previous generated token as input with a probability of 0.25 instead of ground truth label

Args:

:param enc_out: Outputs of the encoder for all time steps (batch_size, length_input_sequence, 2*hidden_size)

:param enc_hidden: Tuple containing final hidden state & cell state of encoder. Shape of h & c: (batch_size, hidden_size)

:param enc_padding_mask: Mask for encoder input; Tensor of size (batch_size, length_input_sequence) with values of 0 for pad tokens & 1 for others

:param ct_e: encoder context vector for time_step=0 (eq 5 in https://arxiv.org/pdf/1705.04304.pdf)

:param extra_zeros: Tensor used to extend vocab distribution for pointer mechanism

:param enc_batch_extend_vocab: Input batch that stores OOV ids

:param batch: batch object

‘’’

dec_batch, max_dec_len, dec_lens, target_batch = get_dec_data(batch) #Get input and target batchs for training decoder

step_losses = []

```
copy_loss = []
s_t = (enc_hidden[0], enc_hidden[1]) #Decoder hidden states
x_t = get_cuda(T.LongTensor(len(enc_out)).fill_(self.start_id)) #Input to the decoder
prev_s = None #Used for intra-decoder attention (section 2.2 in https://arxiv.org/pdf/1705.04304.pdf)
sum_temporal_srcs = None #Used for intra-temporal attention (section 2.1 in https://arxiv.org/pdf/1705.04304.pdf)
for t in range(min(max_dec_len, config.max_dec_steps)):
use_gound_truth = get_cuda((T.rand(len(enc_out)) > 0.25)).long() #Probabilities indicating whether to use ground truth labels instead of previous decoded tokens
x_t = use_gound_truth * dec_batch[:, t] + (1 - use_gound_truth) * x_t #Select decoder input based on use_ground_truth probabilities
x_t = self.model.embeds(x_t)
final_dist, s_t, ct_e, sum_temporal_srcs, prev_s = self.model.decoder(x_t, s_t, enc_out, enc_padding_mask, ct_e, extra_zeros, enc_batch_extend_vocab, sum_temporal_srcs, prev_s)
target = target_batch[:, t]
log_probs = T.log(final_dist + config.eps)
step_loss = F.nll_loss(log_probs, target, reduction="none", ignore_index=self.pad_id)
# except:
# print('')
step_losses.append(step_loss)
x_t = T.multinomial(final_dist, 1).squeeze() #Sample words from final distribution which can be used as input in next time step
is_oov = (x_t >= config.vocab_size).long() #Mask indicating whether sampled word is OOV
x_t = (1 - is_oov) * x_t.detach() + (is_oov) * self.unk_id #Replace OOVs with [UNK] token
# print("ssss",x_t)
#######
copy_loss.append(1-is_oov)
losses = T.sum(T.stack(step_losses, 1), 1) #unnormalized losses for each example in the batch; (batch_size)
batch_avg_loss = losses / dec_lens #Normalized losses; (batch_size)
mle_loss = T.mean(batch_avg_loss) #Average batch loss
# copy_losses = T.sum(T.stack(copy_loss, 1), 1) # unnormalized losses for each example in the batch; (batch_size)
# batch_avg_copy_loss = copy_losses.float() / dec_lens # Normalized losses; (batch_size)
# avg_copy_loss = T.mean(batch_avg_copy_loss)
# mle_loss += avg_copy_loss*5
return mle_loss
```