I wrote a seq2seq model and tried to implement minimum risk training (Eq. (13) in the paper: Minimum Risk Training for Neural Machine Translation)
I added
torch.autograd.set_detect_anomaly(True)
at the beginning of the model.
It outputed an error
RuntimeError: Function 'ExpBackward' returned nan values in its 0th output.
According to the tracceback, it has sth to do with the 2nd line of code below:
seq_nll = seq_nll-torch.max(seq_nll, dim=-1)[0].unsqueeze(1)
seq_probs = torch.pow(torch.exp(seq_nll), 0.005)
normalizer = torch.sum(seq_probs, dim=-1).view(-1, 1)
seq_nll is a tensor of shape (64,3) with very negative numbers like [ -94.5122, -50.0515, -76.2685]
.
These numbers are log likelihood of different sequences.
The exp
operation is to obtain the probability of those sequences.
The power
operation is to scale the probs.
The normalizer
is the sum of those re-scaled probs.
I guess the problem here is related to those very negative numbers.
Is there a stable way to implemente the above code?