I noticed that for the cross_entropy_loss function there is a ignore_index keyword that, by default, is set to -100. Isn’t this kind of dangerous if people are trying to train a classifier that has more than 100 classes (e.g. classifiers on the imagenet data)? They will have one of their classes randomly ignored won’t they? I tried to trace back this keyword myself through the source code to see where it’s actually used, but I eventually get to a torch._C._nn module that I’m not able to find, so I’m not able to confirm if this is a potential issue or not (seems like it would be hard to make this not a potential issue).
I too wonder about this, looking at the THNN code ( https://github.com/torch/nn/blob/master/lib/THNN/generic/ClassNLLCriterion.c ; no idea about what happens in other backends) it seems like it literally compares the two signed integers (current class and ignored_index) in C, so any negative number should just never match.
Of course, I could be wrong, so an official answer or note in the docs about negative numbers not begin interpreted as in Python but as in C might be useful
seems this is the actual class that will be ignored, rather than the index of that class. so if your class is from 0 to num_class-1, then any negative ignore_index is equivalent, I’m too wondering why it is set to this particular value rather than a simple None
I’m a bit late, but I don’t fully understand this claim. Why would two signed integers never match for negative values?
I guess the default value might have been a convenient “invalid”
long to use without overloading the argument list to
long, and a
None object etc.
My guess is that it’s thus just for legacy and beckwards compatibility reasons and if this argument would have been added nowadays, better overloading might have been used.