Label smoothing for only a subset of classes

KFrank · March 9, 2021, 3:31pm

Hi macazinc!

Here’s what I think you’re asking:

You have a multi-class (30 classes) classification problem. You know
that for most of your classes, your ground-truth target labels are
correct, but your labels sometimes mix up two of your classes, say 4
and 9, Let’s say that a sample labelled 4 is actually a 9 25% of the
time, and that a sample labelled 9 is actually a 4 10% of the time.

You will (most likely) want to use cross-entropy loss, but pytorch only
provides a version that takes integer categorical class labels for its
target. In your case, you want what I call soft labels, and will have
to write your own soft-label version of cross-entropy. See this post
for an implementation:

Now let me assume that your (sometimes incorrect) target labels
are given as integer categorical labels. First use one_hot() (followed
by float()) to convert your categorical labels into soft labels (that all
happen to be zero or one). Then whenever a sample is labelled 4
(target[i, 4] == 1.0), set target[i, 4] = 0.75 and
target[i, 9] = 0.25. Similarly, when target[i, 9] == 1.0, set
target[i, 9] = 0.90, and target[i, 4] = 0.10.

You then feed the soft-label target you constructed into the soft-label
cross-entropy you implemented yourself.

Best.

K. Frank