Classification loss for "any class but class X"?

I am trying to train a classifier on a set of classes.

I then want to use the classifier to generate a loss such that given a sample, the loss will only be high if the classifier returns a specific unwanted class. It is preferable that the classifier predicts another class with high “probability”/“confidence”, but it does not matter what the class is, as long as it is not the specified unwanted class.

In my initial attempt I used CELoss to train the classifier and attempted to use negative of that loss -CELoss to achieve my goal. But this seems to give the lowest possible loss when all classification scores are at 0.5.

Any ideas on how this can be achieved?

Hi Zimo!

Why not train your classifier to predict “class X” and then just
post-process your prediction to predict some other class?

Is all you care about that you get a high loss only for “class X?”

Or do you also want just one non-class-X class to be predicted (rather
than a muddle of multiple non-class-X classes)?

I don’t really understand your use case. What do your training data
look like, and, in particular, what are the “ground-truth” labels for your
training data? Is each training sample labelled with its corresponding
“class-X?” Or is it labelled with the actual class of the sample,
together with the particular class-X that is evil for that sample?

Once you have your trained classifier together with any additional
post-processing, what do you want the output of your overall “system”
to be? At “inference” time, I give you a previously-unseen sample.
What output do you want your system to produce?

Best.

K. Frank