Cross entropy between two softmax outputs

Hi all, I want to compute the cross-entropy between two 2D tensors that are the outputs of the softmax function.


softmax_out1 and softmax_out2 are 2D tensors with shapes (128,10) that 128 refers to the batch size and 10 is the number of classes.
the following error occurs:

RuntimeError: 1D target tensor expected, multi-target not supported

any example code to handle this error would be appreciated.

In Pytorch, nn.CrossEntropyLoss combines LogSoftmax and NLLLoss. Your input to nn.CrossEntropyLoss should be logits and the original targets and not the softmax probabilities themselves .

Also, it should not be used as
loss=nn.CrossEntropyLoss(output, target)

but instead as below:
loss = nn.CrossEntropyLoss()(output, target)

Do share your training loop code if possible.

1 Like

Hi S.!

The short answer is that you have to write your own cross-entropy
function to do what you want – see below.

There are two things going on here:

First, as Aman noted, the input to CrossEntropyLoss (your
softmax_out1) should be raw-score logits that range from -inf to
+inf, rather than probabilities that range from 0.0 to 1.0. So you
want to pass logits in as the input without converting them to
probabilities by running them through softmax().

Second, CrossEntropyLoss expects its target (your softmax_out2)
to be integer class labels (with shape [nBatch], rather than
[nBatch, nClass]). So CategoricalCrossEntropyWithLogitsLoss
might be a better (if lengthier) name for this loss function.

Now, how to do what you want:

Even if you write your own cross-entropy loss function, you do not
want to pass in probabilities for your input as doing so will be less
numerically stable than passing in logits.

It does, however, make sense to use probabilities (rather than integer
class labels) for your target. (These are sometimes called soft labels
or soft targets.) It’s just that pytorch doesn’t offer such a version of
cross entropy.

The following post shows how to implement such a “soft cross-entropy”
loss. It takes logits for its input (for numerical stability) and takes
probabilities for its “soft-label” target:


K. Frank