Question about Tenosr itself!

deepmo · October 26, 2018, 8:47am

My loss have “NaN” problem during training. It is caused by a tensor where some elements are zeros, but it shouldn’t be zeros. When I print the tensor, I find a very strange phenomenon! Just see below:

print(alpha)

result:

tensor([[8.4926e-06, 2.9750e-04, 2.0732e-05, 6.5416e-05, 1.2785e-04, 1.3783e-04,
         8.0852e-05, 2.9128e-04, 8.3976e-06, 3.4069e-05],
        [4.3417e-06, 4.6478e-06, 4.7960e-07, 1.1161e-05, 2.4620e-06, 3.6034e-03,
         2.8802e-06, 3.2489e-07, 7.1982e-06, 7.9321e-06],
        [3.5168e-04, 1.3959e-04, 6.6393e-03, 9.1526e-04, 1.3291e-04, 3.9203e-05,
         2.0094e-05, 8.3743e-05, 1.3102e-04, 1.3114e-04],
        [2.2118e-05, 3.0005e-05, 4.5028e-05, 2.7926e-02, 4.7457e-04, 1.3916e-04,
         3.8518e-06, 1.7940e-05, 6.8158e-05, 2.4488e-05],
        [1.1020e-04, 9.7328e-06, 9.2700e-05, 2.2556e-02, 1.9584e-04, 2.0269e-04,
         6.9351e-06, 1.5777e-05, 2.5748e-04, 4.8471e-05],
        [5.4854e-03, 6.5477e-06, 1.3129e-04, 2.8175e-05, 1.2210e-05, 8.0755e-06,
         9.8790e-05, 3.7378e-06, 3.2873e-04, 3.7017e-05],
        [3.2331e-04, 2.5080e-06, 7.0140e-06, 1.2707e-05, 5.7030e-05, 6.1795e-04,
         1.0593e-02, 1.4990e-06, 7.0081e-05, 1.4437e-05],
        [4.5361e-05, 1.7738e-04, 5.7259e-06, 1.1173e-04, 1.4167e-04, 2.3912e-02,
         7.2580e-05, 5.9300e-06, 2.1757e-05, 4.4645e-05],
        [8.0816e-05, 9.9628e-06, 4.9268e-05, 2.9979e-04, 4.9817e-06, 2.5931e-05,
         3.2751e-05, 1.4161e-05, 2.5827e-02, 3.5003e-04],
        [1.7151e-05, 4.0785e-06, 8.5164e-05, 3.1453e-04, 9.0293e-06, 1.0992e-05,
         2.4123e-06, 1.2716e-05, 4.0623e-04, 2.6726e-05],
        [4.6351e-04, 3.3088e-05, 4.1036e-04, 2.5175e-04, 2.6937e-05, 5.9167e-05,
         1.1110e-04, 2.1697e-05, 7.3982e-03, 1.1573e-04],
        [1.6273e-02, 1.0944e-05, 3.8956e-04, 9.3451e-06, 1.0117e-05, 1.1785e-06,
         3.2926e-05, 3.4684e-06, 1.2162e-05, 1.8236e-05]],
       dtype=torch.float64)

for i in range(len(alpha)):
    print(alpha[i])

result:

tensor([8.4926e-06, 2.9750e-04, 2.0732e-05, 6.5416e-05, 1.2785e-04, 1.3783e-04,
        8.0852e-05, 2.9128e-04, 8.3976e-06, 3.4069e-05], dtype=torch.float64)
tensor([4.3417e-06, 4.6478e-06, 4.7960e-07, 1.1161e-05, 2.4620e-06, 3.6034e-03,
        2.8802e-06, 3.2489e-07, 7.1982e-06, 7.9321e-06], dtype=torch.float64)
tensor([0.0004, 0.0001, 0.0066, 0.0009, 0.0001, 0.0000, 0.0000, 0.0001, 0.0001,
        0.0001], dtype=torch.float64)
tensor([2.2118e-05, 3.0005e-05, 4.5028e-05, 2.7926e-02, 4.7457e-04, 1.3916e-04,
        3.8518e-06, 1.7940e-05, 6.8158e-05, 2.4488e-05], dtype=torch.float64)
tensor([1.1020e-04, 9.7328e-06, 9.2700e-05, 2.2556e-02, 1.9584e-04, 2.0269e-04,
        6.9351e-06, 1.5777e-05, 2.5748e-04, 4.8471e-05], dtype=torch.float64)
tensor([5.4854e-03, 6.5477e-06, 1.3129e-04, 2.8175e-05, 1.2210e-05, 8.0755e-06,
        9.8790e-05, 3.7378e-06, 3.2873e-04, 3.7017e-05], dtype=torch.float64)
tensor([3.2331e-04, 2.5080e-06, 7.0140e-06, 1.2707e-05, 5.7030e-05, 6.1795e-04,
        1.0593e-02, 1.4990e-06, 7.0081e-05, 1.4437e-05], dtype=torch.float64)
tensor([4.5361e-05, 1.7738e-04, 5.7259e-06, 1.1173e-04, 1.4167e-04, 2.3912e-02,
        7.2580e-05, 5.9300e-06, 2.1757e-05, 4.4645e-05], dtype=torch.float64)
tensor([8.0816e-05, 9.9628e-06, 4.9268e-05, 2.9979e-04, 4.9817e-06, 2.5931e-05,
        3.2751e-05, 1.4161e-05, 2.5827e-02, 3.5003e-04], dtype=torch.float64)
tensor([1.7151e-05, 4.0785e-06, 8.5164e-05, 3.1453e-04, 9.0293e-06, 1.0992e-05,
        2.4123e-06, 1.2716e-05, 4.0623e-04, 2.6726e-05], dtype=torch.float64)
tensor([0.0005, 0.0000, 0.0004, 0.0003, 0.0000, 0.0001, 0.0001, 0.0000, 0.0074,
        0.0001], dtype=torch.float64)
tensor([1.6273e-02, 1.0944e-05, 3.8956e-04, 9.3451e-06, 1.0117e-05, 1.1785e-06,
        3.2926e-05, 3.4684e-06, 1.2162e-05, 1.8236e-05], dtype=torch.float64)

See, there are some differences! When I use a specific element(like alpha[2]), the value is truncated to 0.0001，and cause some value to zeros. I am confused about that. Can anyone help me?

tumble-weed · October 26, 2018, 9:59am

are you sure it isn’t the print statement that is just printing the values in this form. maybe you should print alpha==0 and see if it places a 1 anywhere, that would mean the element actually is zero there.

deepmo · October 26, 2018, 11:10am

You are right, I can’t get any 1 when print alpha==0. Thank you! well I need to find what happens to my loss again hhhhhh