My transformer NMT model is giving "nan" loss value

The padding mask or the triangular mask you gave for causality? And how did you fix it? What was the problem?