Bespoke Cross Entropy Loss

Is it possible to use cross entropy loss where some mistakes are penalized less. E.g. lets us assume we have four categories horse, zebra, whale, shark. If the network chooses a horse instead of zebra the error should be smaller than if it chooses a whale instead of a zebra and so forth.

Hello Tank!

Yes, you can certainly do something like this, and it is a reasonable
thing to consider.

One way to do it is as follows:

First, remember that CrossEntropyLoss, as implemented in pytorch,
is a special case of cross entropy. Cross entropy is a measure of the
mismatch between two probability distributions – your predicted
distribution and your target (known, “ground truth”) distribution.

Your predicted distribution is a set of class probabilities that sum to 1.
But for pytorch’s CrossEntropyLoss, your target is not really a full
probability distribution – it is a single integer class label. That is, it
represents one class having known probability of 1, and the other
classes having probability 0.

What you could do is write your own cross-entropy loss that takes a
full set of class probabilities as its targets. You could, then, for example,
assign your target probabilities as follows:

P = [horse, zebra, whale, shark]
class label = horse:
   P = [0.8, 0.2, 0.0, 0.0]
class label = zebra:
   P = [0.3, 0.7, 0.0, 0.0]
class label = whale:
   P = [0.0, 0.0, 0.9, 0.1]
class label = shark:
   P = [0.0, 0.0, 0.1, 0.9]

This would be instead of the “pure” single-class probabilities that
CrossEntropyLoss uses implicitly:

   P = [1.0, 0.0, 0.0, 0.0]
   P = [0.0, 1.0, 0.0, 0.0]
   P = [0.0, 0.0, 1.0, 0.0]
   P = [0.0, 0.0, 0.0, 1.0]

respectively.

Or you could do something like:

For each of the three wrong answers, you could calculate their pairwise
cross entropies with the right answer, and sum them together with
weights. Thus, you could weight the horse-zebra misprediction cross
entropy less, for example, than the whale-zebra misprediction cross
entropy.

(As a matter of nomenclature, I probably wouldn’t call this second
approach cross entropy, but it’s related, and might do what you want.)

Good luck.

K. Frank

1 Like