How should I implement cross-entropy loss with continuous target outputs?

garytho · December 4, 2017, 1:46am

The current version of cross-entropy loss only accepts one-hot vectors for target outputs.
I need to implement a version of cross-entropy loss that supports continuous target distributions. What I don’t know is how to implement a version of cross-entropy loss that is numerically stable.

For example, would the following implementation work well?

output = model(input) #model output is a softmax distribution over 3 categories
target = Variable(torch.FloatTensor([0.1, 0.7, 0.2])) #target distribution is continuous – not one-hot
loss = -1 * torch.sum(target * torch.log(output)) #compute the cross-entropy
loss.backward()

Is this stable? Is there a built-in version I can use?

Thank you

sparseinference · December 4, 2017, 2:11am

I don’t know if it’s possible, but I’m interested in why you would want to use CrossEntropyLoss?

I have used MSELoss for similar things with good results.

Also, unless I’m very mistaken, the targets for nn.CrossEntropyLoss are not one-hot.

SimonW · December 4, 2017, 2:41am

Change softmax + log to nn.LogSoftmax and you are golden .

SimonW · December 4, 2017, 2:42am

Unfortunately, in current PyTorch’s CrossEntropyLoss, they are one-hot in the sense that target contains only one ground-truth class with “probability” 1.

Brando_Miranda · December 29, 2017, 9:21pm

sorry can you detail a little more what you mean by your answer?

Change softmax + log to nn.LogSoftmax and you are golden

garytho · December 29, 2017, 9:26pm

instead of doing torch.log(<model_softmax_output>),

change the last layer of the neural network to LogSoftmax and remove the torch.log() from the loss equation.

Brando_Miranda · December 29, 2017, 9:28pm

may I see it in code?

Brando_Miranda · December 29, 2017, 9:40pm

I’m not doing

torch.log(<model_softmax_output>)

I’m doing

loss = criterion(y_pred, batch_ys)

with:

criterion = torch.nn.CrossEntropyLoss()

garytho · December 29, 2017, 9:49pm

If you are using torch.nn.CrossEntropyLoss() then you don’t need a softmax output layer on your model.

So it would just be

output = model(input) #logit output
criterion =torch.nn.CrossEntropyLoss()
loss = criterion(output, target)

Brando_Miranda · December 29, 2017, 10:01pm

if the output of the model is a probability distribution the right thing is to use cross entropy as its equivalent to MLE. Using square loss is something else (usually assumes the noise is Gaussian).

Brando_Miranda · December 29, 2017, 10:02pm

Honestly I’d rather not use torch.nn.CrossEntropyLoss(), and thats why I was asking to look at the actual code you used.

garytho · December 29, 2017, 10:08pm

My code is specific for target distributions that are not one-hot, I don’t know if that’s what you want, but does this help?

output = model(input) #final layer of model is LogSoftmax(), so the output is the log-probability distribution
target = Variable(torch.FloatTensor([0.1, 0.7, 0.2])) #target probability distribution
loss = -1 * torch.sum(target * output) #the crossentropy formula is -1 * sum( log(output_dist) * target_dist)
loss.backward()

Brando_Miranda · December 29, 2017, 10:12pm

yes I don’t have hot vectors either I’m learning a distribution or continuous target values as a well.

Though, I thought that wasn’t right (due to numerical issues), hence your question…am I right?

garytho · December 29, 2017, 10:20pm

I’m confused what you are asking is correct, but the code I wrote above works. The numerical problem arises when taking torch.log of the softmax distribution because it could potentially output nan.

imosafi · January 13, 2018, 5:53pm

Hi
I have the same problem and tried this solution but seems like it’s not working very well.
Is there a way to achieve this while keeping the model output as a regular softmax?

garytho · January 14, 2018, 4:27pm

What do you mean by “it’s not working very well”?

If you make the output of your neural network softmax, and then take the log of it, it will be slower than logsoftmax and sometimes the output will be nan.

There are other loss functions, but cross-entropy loss is arguably the best one for probability distributions.

imosafi · January 14, 2018, 5:13pm

I meant I don’t get good results (of course there might be a different cause for this)
Anyway I would feel more comfortable if the model could just output a softmax and this way I could make sure there is no problem there.

Hongyi_Zhang · January 19, 2018, 6:08am

The following code should work in PyTorch 0.2:

def cross_entropy(pred, soft_targets):
    logsoftmax = nn.LogSoftmax()
    return torch.mean(torch.sum(- soft_targets * logsoftmax(pred), 1))

assuming pred and soft_targets are both Variables with shape (batchsize, num_of_classes), each row of pred is predicted logits and each row of soft_targets is a discrete distribution.

Diego999 · February 15, 2018, 11:47am

Is there now an “official” pytorch function to do it or should we still do it by hand ?

smk508 · April 10, 2018, 8:07pm

I believe you can use BCELoss, as long as your label and outputs are represented as normalized vectors. For example,

loss_fn = nn.BCELoss()
softmax = nn.Softmax()
input = Variable(torch.randn(3))
output = softmax(input)
target = Variable(torch.FloatTensor([.1, .7, .2]))
loss = loss_fn(output, target)