When running cross_entropy
with indices to ignore and no reduction, I get inconsistent results in the ignored positions. For example:
In [547]: scores
Out[547]:
tensor([[0.6453, 0.0335, 0.8880, 0.7096, 0.6212],
[0.3379, 0.0559, 0.8633, 0.3935, 0.2379],
[0.8813, 0.2753, 0.9798, 0.4739, 0.1096],
[0.3567, 0.8076, 0.1867, 0.1396, 0.0971],
[0.6270, 0.1037, 0.1204, 0.2856, 0.8055],
[0.4446, 0.0313, 0.6597, 0.9088, 0.3563],
[0.6929, 0.4242, 0.9795, 0.7334, 0.0760],
[0.2506, 0.3346, 0.2624, 0.8448, 0.4569],
[0.9056, 0.2093, 0.7258, 0.8028, 0.3810],
[0.4425, 0.4331, 0.1988, 0.1506, 0.1138]])
In [548]: gold
Out[548]: tensor([ 2, 0, 0, 4, -1, 1, 0, 0, -1, 2])
In [549]: F.cross_entropy(scores, gold, ignore_index=-1, reduction='none')
Out[549]:
tensor([1.3382e+00, 1.6877e+00, 1.3288e+00, 1.8669e+00, 9.9920e-16, 2.1011e+00,
1.5427e+00, 1.8149e+00, 0.0000e+00, 1.6885e+00])
In [550]: F.cross_entropy(scores, gold, ignore_index=-1, reduction='none')
Out[550]:
tensor([1.3382e+00, 1.6877e+00, 1.3288e+00, 1.8669e+00, 4.2039e-45, 2.1011e+00,
1.5427e+00, 1.8149e+00, 0.0000e+00, 1.6885e+00])
In [551]: F.cross_entropy(scores, gold, ignore_index=-1, reduction='none')
Out[551]:
tensor([1.3382e+00, 1.6877e+00, 1.3288e+00, 1.8669e+00, 2.4612e-28, 2.1011e+00,
1.5427e+00, 1.8149e+00, 0.0000e+00, 1.6885e+00])
In [552]: F.cross_entropy(scores, gold, ignore_index=-1, reduction='none')
Out[552]:
tensor([1.3382e+00, 1.6877e+00, 1.3288e+00, 1.8669e+00, 7.0065e-45, 2.1011e+00,
1.5427e+00, 1.8149e+00, 2.2695e-21, 1.6885e+00])
In [553]: F.cross_entropy(scores, gold, ignore_index=-1, reduction='none')
Out[553]:
tensor([1.3382e+00, 1.6877e+00, 1.3288e+00, 1.8669e+00, 4.2039e-45, 2.1011e+00,
1.5427e+00, 1.8149e+00, 0.0000e+00, 1.6885e+00])
In [554]: F.cross_entropy(scores, gold, ignore_index=-1, reduction='none')
Out[554]:
tensor([1.3382e+00, 1.6877e+00, 1.3288e+00, 1.8669e+00, 8.1065e+16, 2.1011e+00,
1.5427e+00, 1.8149e+00, 2.3510e-38, 1.6885e+00])
In the last example, one of the ignored positions had a loss of 8e16!
So, is it working as expected, since these positions are to be ignored and the reduction apparently works fine? Even if it is, I find it very strange not to get always zeros in the ignored positions.