Is it possible to use cross entropy loss where some mistakes are penalized less. E.g. lets us assume we have four categories horse, zebra, whale, shark. If the network chooses a horse instead of zebra the error should be smaller than if it chooses a whale instead of a zebra and so forth.

Hello Tank!

Yes, you can certainly do something like this, and it is a reasonable

thing to consider.

*One* way to do it is as follows:

First, remember that `CrossEntropyLoss`

, as implemented in pytorch,

is a special case of cross entropy. Cross entropy is a measure of the

mismatch between two probability distributions â€“ your predicted

distribution and your target (known, â€śground truthâ€ť) distribution.

Your predicted distribution is a set of class probabilities that sum to 1.

But for pytorchâ€™s `CrossEntropyLoss`

, your *target* is not really a full

probability distribution â€“ it is a single integer class label. That is, it

represents one class having known probability of 1, and the other

classes having probability 0.

What you could do is write your own cross-entropy loss that takes a

full set of class probabilities as its targets. You could, then, for example,

assign your target probabilities as follows:

```
P = [horse, zebra, whale, shark]
class label = horse:
P = [0.8, 0.2, 0.0, 0.0]
class label = zebra:
P = [0.3, 0.7, 0.0, 0.0]
class label = whale:
P = [0.0, 0.0, 0.9, 0.1]
class label = shark:
P = [0.0, 0.0, 0.1, 0.9]
```

This would be instead of the â€śpureâ€ť single-class probabilities that

`CrossEntropyLoss`

uses implicitly:

```
P = [1.0, 0.0, 0.0, 0.0]
P = [0.0, 1.0, 0.0, 0.0]
P = [0.0, 0.0, 1.0, 0.0]
P = [0.0, 0.0, 0.0, 1.0]
```

respectively.

Or you could do something like:

For each of the three wrong answers, you could calculate their pairwise

cross entropies with the right answer, and sum them together with

weights. Thus, you could weight the horse-zebra misprediction cross

entropy less, for example, than the whale-zebra misprediction cross

entropy.

(As a matter of nomenclature, I probably wouldnâ€™t call this second

approach cross entropy, but itâ€™s related, and might do what you want.)

Good luck.

K. Frank