Question about BCE* losses interface and features

This is my personal opinion and others might have different preferences, so take it with a grain of salt :wink:

For a multi-class classification, I would use nn.CrossEntropyLoss, which also provides the ignore_index argument. This makes sense, as e.g. if I’m dealing with 1000 classes, I might just want to ignore a certain one.

In a binary classification, you could still use nn.CrossEntropoyLoss with two outputs (possibly more, if you ignore this class) or alternatively nn.BCE(WithLogits)Loss.
An ignore_index argument doesn’t really make sense in the latter case, since we are dealing with float values, and we are just using a single output neuron, which should give us the probability (logit) of the positive class. Ignoring a class in a binary setup seems a bit strange, and it might be simpler to just calculate the loss of a single class instead (if that’s the use case).

For a multi-label classification, I would use also use nn.BCE(WithLogits)Loss, where each neuron corresponds to the probability (logit) of the corresponding class.
Ignoring certain classes in this use case could in fact make sense.