From the source : https://pytorch.org/docs/stable/_modules/torch/nn/functional.html#cross_entropy
def cross_entropy(input, target, weight=None, size_average=None, ignore_index=-100,
reduce=None, reduction='mean'):
Is there a specific reason that this is -100?
I strongly doubt that there is any special reason that this is -100.
a value like -100 is a reasonable default because it is highly unlikely that anyone will pass such value by mistake (unlike -1, which could be a result of off-by-when calculation).
There should be zero expected difference if you set ignore_index= -40 and pass target value of -40 for all of the samples that you want to be ignored in the gradient calculation.