Cross-entropy with one-hot targets

Dawid_S · February 12, 2018, 10:29pm

I’d like to use the cross-entropy loss function that can take one-hot encoded values as the target.

# Fake NN output
out = torch.FloatTensor([[0.05, 0.9, 0.05], [0.05, 0.05, 0.9], [0.9, 0.05, 0.05]])
out = torch.autograd.Variable(out)

# Categorical targets
y = torch.LongTensor([1, 2, 0])
y = torch.autograd.Variable(y)

# One-hot encoded targets
y1 = torch.FloatTensor([[0, 1, 0], [0, 0, 1], [1, 0, 0]])
y1 = torch.autograd.Variable(y1)

# Calculating the loss
loss_val = nn.CrossEntropyLoss()(out, y)
loss_val1 = nn.BCEWithLogitsLoss()(out, y1)

print(loss_val)
print(loss_val1)

Variable containing:
0.6178
[torch.FloatTensor of size 1]

Variable containing:
0.5927
[torch.FloatTensor of size 1]

The output of the two functions is not the same.

How to implement the cross-entropy to get to same output passing one-hot encoded targets?

richard · February 12, 2018, 11:21pm

Try:

_, targets = y1.max(dim=0)
nn.CrossEntropyLoss()(out, Variable(targets))

Dawid_S · February 13, 2018, 1:06pm

That’s not what I mean. I need to pass one-hot vector, because later I want to use smoothed values as targets
(example [0.1, 0.1, 0.8]). max() won’t help here.

The first thing I want to achieve is to get the same results using CrossEntropyLoss() and some loss that takes one-hot encoded values without smoothing

richard · February 13, 2018, 3:18pm

nn.CrossEntropyLoss doesn’t take a one-hot vector, it takes class values. You can create a new function that wraps nn.CrossEntropyLoss, in the following manner:

def cross_entropy_one_hot(input, target):
    _, labels = target.max(dim=0)
    return nn.CrossEntropyLoss()(input, labels)

Also I’m not sure I’m understanding what you want. nn.BCELossWithLogits and nn.CrossEntropyLoss are different in the docs; I’m not sure in what situation you would expect the same loss from them.

Dawid_S · February 13, 2018, 9:36pm

Sorry for not being clear enough.

I want to use soft targets with Cross Entropy. Soft target is one-hot-like vector, but continuous, not zero/one. The example target is in the example code below.

This is what I want:

def cross_entropy(input, target, size_average=True):
    """ Cross entropy that accepts soft targets
    Args:
         pred: predictions for neural network
         targets: targets, can be soft
         size_average: if false, sum is returned instead of mean

    Examples::

        input = torch.FloatTensor([[1.1, 2.8, 1.3], [1.1, 2.1, 4.8]])
        input = torch.autograd.Variable(out, requires_grad=True)

        target = torch.FloatTensor([[0.05, 0.9, 0.05], [0.05, 0.05, 0.9]])
        target = torch.autograd.Variable(y1)
        loss = cross_entropy(input, target)
        loss.backward()
    """
    logsoftmax = nn.LogSoftmax()
    if size_average:
        return torch.mean(torch.sum(-target * logsoftmax(input), dim=1))
    else:
        return torch.sum(torch.sum(-target * logsoftmax(input), dim=1))

i’m wondering if I can use any PyTorch function to pass soft targets to it, like in the example above. I was trying with BCELoss.

richard · February 13, 2018, 10:18pm

Ah I see. Thank you for your clarification.

BCELoss doesn’t quite do what you want it to do, because it has that extra term on the right (and I presume you only want the term on the left?)

There’s no built in PyTorch function to do this right now, but you can use the cross_entropy function you defined and autograd will work with it. Are you finding that this function is too slow as it is?

Dawid_S · February 14, 2018, 7:29am

You’re right. I’ve heard however a few times (like here
Labels smoothing and categorical loss functions - alternatives?), that one can use BCELoss for soft targets somehow and I was trying to figure it out.

The function I defined above is twice faster than CrossEntropyFunction on cpu. It’s not a real reason but somehow I’m not sure I can trust it

richard · February 14, 2018, 4:47pm

# Calculating the loss
loss_val = nn.CrossEntropyLoss()(out, y)         # (1)
loss_val1 = nn.BCEWithLogitsLoss()(out, y1)   # (2)

If you’re referring to these, you definitely shouldn’t trust (1) because it doesn’t give you the behavior you want. (2) works, but has an extra term that may or may not affect your training.

evanthebouncy · May 18, 2018, 11:38pm

best to implement your own.

cross entropy loss is something like this I think . . .

[0.1, 0.2, 0.7] (prediction) ------------------ [1.0, 0.0, 0.0] (target)

what you want is - (1.0 * log(0.1) + 0.0 * log(0.2) + 0.0 * log(0.7)) this is the cross entropy loss

so to translate that into code, you have prediction (a vector of length k) and target (a vector of length k, not nessesarily 1 hot)

what you would do would be something like -1 * sum(log(prediction) * target)

so this is what I have in my own code, hopefully it’s helpful

48 # simple cross entropy cost (might be numerically unstable if pred has 0)
49 def xentropy_cost(x_target, x_pred):
50 assert x_target.size() == x_pred.size(),
51 "size fail ! "+str(x_target.size()) + " " + str(x_pred.size())
52 logged_x_pred = torch.log(x_pred)
53 cost_value = -torch.sum(x_target * logged_x_pred)
54 return cost_value

Zhang_Chi · August 31, 2020, 1:14am

hello. if the target label, i.e. y_n is the one-hot label, should it be the same with crossentropy loss? but why they result in different results in the examples?

jiahuei · November 24, 2020, 11:54am

nn.CrossEntropyLoss computes softmax cross entropy.
nn.BCEWithLogitsLoss computes sigmoid cross entropy.

That’s why they are different.

jinhyun-so · January 2, 2021, 9:45pm

I would like to implement exactly same function as you, Cross Entropy which takes soft target one-hot-like vector as an input.

I tried to train CNN with your cross_entropy function, but it seems that backward propagation of this function does not work properly.

Should I implement further this function for the backward propagation?
If yes, please provide me some guide.