I wrote a seq2seq model and tried to implement minimum risk training (Eq. (13) in the paper: Minimum Risk Training for Neural Machine Translation)
torch.autograd.set_detect_anomaly(True) at the beginning of the model.
It outputed an error
RuntimeError: Function 'ExpBackward' returned nan values in its 0th output.
According to the tracceback, it has sth to do with the 2nd line of code below:
seq_nll = seq_nll-torch.max(seq_nll, dim=-1).unsqueeze(1)
seq_probs = torch.pow(torch.exp(seq_nll), 0.005)
normalizer = torch.sum(seq_probs, dim=-1).view(-1, 1)
seq_nll is a tensor of shape (64,3) with very negative numbers like
[ -94.5122, -50.0515, -76.2685].
These numbers are log likelihood of different sequences.
exp operation is to obtain the probability of those sequences.
power operation is to scale the probs.
normalizer is the sum of those re-scaled probs.
I guess the problem here is related to those very negative numbers.
Is there a stable way to implemente the above code?