My loss have “NaN” problem during training. It is caused by a tensor where some elements are zeros, but it shouldn’t be zeros. When I print the tensor, I find a very strange phenomenon! Just see below:
print(alpha)
result:
tensor([[8.4926e-06, 2.9750e-04, 2.0732e-05, 6.5416e-05, 1.2785e-04, 1.3783e-04,
8.0852e-05, 2.9128e-04, 8.3976e-06, 3.4069e-05],
[4.3417e-06, 4.6478e-06, 4.7960e-07, 1.1161e-05, 2.4620e-06, 3.6034e-03,
2.8802e-06, 3.2489e-07, 7.1982e-06, 7.9321e-06],
[3.5168e-04, 1.3959e-04, 6.6393e-03, 9.1526e-04, 1.3291e-04, 3.9203e-05,
2.0094e-05, 8.3743e-05, 1.3102e-04, 1.3114e-04],
[2.2118e-05, 3.0005e-05, 4.5028e-05, 2.7926e-02, 4.7457e-04, 1.3916e-04,
3.8518e-06, 1.7940e-05, 6.8158e-05, 2.4488e-05],
[1.1020e-04, 9.7328e-06, 9.2700e-05, 2.2556e-02, 1.9584e-04, 2.0269e-04,
6.9351e-06, 1.5777e-05, 2.5748e-04, 4.8471e-05],
[5.4854e-03, 6.5477e-06, 1.3129e-04, 2.8175e-05, 1.2210e-05, 8.0755e-06,
9.8790e-05, 3.7378e-06, 3.2873e-04, 3.7017e-05],
[3.2331e-04, 2.5080e-06, 7.0140e-06, 1.2707e-05, 5.7030e-05, 6.1795e-04,
1.0593e-02, 1.4990e-06, 7.0081e-05, 1.4437e-05],
[4.5361e-05, 1.7738e-04, 5.7259e-06, 1.1173e-04, 1.4167e-04, 2.3912e-02,
7.2580e-05, 5.9300e-06, 2.1757e-05, 4.4645e-05],
[8.0816e-05, 9.9628e-06, 4.9268e-05, 2.9979e-04, 4.9817e-06, 2.5931e-05,
3.2751e-05, 1.4161e-05, 2.5827e-02, 3.5003e-04],
[1.7151e-05, 4.0785e-06, 8.5164e-05, 3.1453e-04, 9.0293e-06, 1.0992e-05,
2.4123e-06, 1.2716e-05, 4.0623e-04, 2.6726e-05],
[4.6351e-04, 3.3088e-05, 4.1036e-04, 2.5175e-04, 2.6937e-05, 5.9167e-05,
1.1110e-04, 2.1697e-05, 7.3982e-03, 1.1573e-04],
[1.6273e-02, 1.0944e-05, 3.8956e-04, 9.3451e-06, 1.0117e-05, 1.1785e-06,
3.2926e-05, 3.4684e-06, 1.2162e-05, 1.8236e-05]],
dtype=torch.float64)
for i in range(len(alpha)):
print(alpha[i])
result:
tensor([8.4926e-06, 2.9750e-04, 2.0732e-05, 6.5416e-05, 1.2785e-04, 1.3783e-04,
8.0852e-05, 2.9128e-04, 8.3976e-06, 3.4069e-05], dtype=torch.float64)
tensor([4.3417e-06, 4.6478e-06, 4.7960e-07, 1.1161e-05, 2.4620e-06, 3.6034e-03,
2.8802e-06, 3.2489e-07, 7.1982e-06, 7.9321e-06], dtype=torch.float64)
tensor([0.0004, 0.0001, 0.0066, 0.0009, 0.0001, 0.0000, 0.0000, 0.0001, 0.0001,
0.0001], dtype=torch.float64)
tensor([2.2118e-05, 3.0005e-05, 4.5028e-05, 2.7926e-02, 4.7457e-04, 1.3916e-04,
3.8518e-06, 1.7940e-05, 6.8158e-05, 2.4488e-05], dtype=torch.float64)
tensor([1.1020e-04, 9.7328e-06, 9.2700e-05, 2.2556e-02, 1.9584e-04, 2.0269e-04,
6.9351e-06, 1.5777e-05, 2.5748e-04, 4.8471e-05], dtype=torch.float64)
tensor([5.4854e-03, 6.5477e-06, 1.3129e-04, 2.8175e-05, 1.2210e-05, 8.0755e-06,
9.8790e-05, 3.7378e-06, 3.2873e-04, 3.7017e-05], dtype=torch.float64)
tensor([3.2331e-04, 2.5080e-06, 7.0140e-06, 1.2707e-05, 5.7030e-05, 6.1795e-04,
1.0593e-02, 1.4990e-06, 7.0081e-05, 1.4437e-05], dtype=torch.float64)
tensor([4.5361e-05, 1.7738e-04, 5.7259e-06, 1.1173e-04, 1.4167e-04, 2.3912e-02,
7.2580e-05, 5.9300e-06, 2.1757e-05, 4.4645e-05], dtype=torch.float64)
tensor([8.0816e-05, 9.9628e-06, 4.9268e-05, 2.9979e-04, 4.9817e-06, 2.5931e-05,
3.2751e-05, 1.4161e-05, 2.5827e-02, 3.5003e-04], dtype=torch.float64)
tensor([1.7151e-05, 4.0785e-06, 8.5164e-05, 3.1453e-04, 9.0293e-06, 1.0992e-05,
2.4123e-06, 1.2716e-05, 4.0623e-04, 2.6726e-05], dtype=torch.float64)
tensor([0.0005, 0.0000, 0.0004, 0.0003, 0.0000, 0.0001, 0.0001, 0.0000, 0.0074,
0.0001], dtype=torch.float64)
tensor([1.6273e-02, 1.0944e-05, 3.8956e-04, 9.3451e-06, 1.0117e-05, 1.1785e-06,
3.2926e-05, 3.4684e-06, 1.2162e-05, 1.8236e-05], dtype=torch.float64)
See, there are some differences! When I use a specific element(like alpha[2]), the value is truncated to 0.0001,and cause some value to zeros. I am confused about that. Can anyone help me?