Hello, guys!
I am incorporating copy mechanism into transformer model and use CrossEntropy Loss to do optimization.
Typically, I would use
loss_fn = torch.nn.NLLloss()
log_softmax = torch.nn.LogSoftmax()
predictions = log_softmax(logits)
loss = loss_fn(predictions,targets)
but in copy mechanism: I have to calculate log probability myself by
probability = p_gen * generation_probability + (1-p_gen)*copy_probability
loss_fn = torch.nn.NLLloss()
loss = loss_fn(torch.log(probability),targets)
which means I can not use log_softmax which can bring numerical stability .
So my training process get nan problem , what should I do to avoid this?
BTW,i use fp16.