Zero weight in cross entropy is same as ignore index

hadaev8 · December 26, 2019, 9:05am

I need alluxary classification in very imbalanced data.
And dont need major class.
Before i had linear layer with 9 neurons and ignore index - 1.
But now i think it is better to have 10 classes (this values still used in another part of model).
Is it same to set weight zero for major class as i used to ignore it with ignore index?

ptrblck · December 27, 2019, 3:34am

ignore_index will ignore valid as well as invalid target indices.
E.g. if you are dealing with 10 classes, your model output would have a dimension of [batch_size, 10] in the vanilla multi-class classification setup, while the target will be a LongTensor of shape [batch_size] containing class indices in the range [0, 9].
If your target contains any other values outside this range, you’ll get a RuntimeError:

RuntimeError: Expected cur_target >= 0 && cur_target < n_classes to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

Now, if you specify a special class index in ignore_index, your target might also contain this index without throwing an error.

On the other hand, setting a weight of zero to a specific valid class, will create zero gradients for this class logit.
Setting ignore_index to the same valid class will yield the same behavior.

While the use cases might be different, if you use a valid class index for both approaches, you should see the same result.