Functional.cross_entropy with one-hot vector

I’m using torch.nn.functional.cross_entropy with one-hot vectors as target when I realized that it wants class probabilities or class indices as target, not one-hot vector.
But my thought is, isn’t it the same? I mean, if I use one-hot vector as FloatTensor is like setting 100% probability on a class and 0% on the others?
Indeed, with this simple example:

logits = torch.zeros([2,2], dtype=torch.float32)
logits[0] = torch.tensor([-0.1, 21.0])
logits[1] = torch.tensor([10.1, -2.0])

I get the same result with both approaches:

# One-hot to FloatTensor
t = torch.tensor([[0.0,1.0],[1.0,0.0]])
print(F.cross_entropy(logit,t))
# Class indices
t = torch.tensor([1,0], dtype=torch.int64)
print(F.cross_entropy(logit,t))

Can I have some problem with the first approach? Is it correct?
Thank you!

Hi Jacopo!

You understand this correctly, and there is nothing wrong with the
first approach. (To be clear, cross_entropy() accepts targets that
are either floating-point “probabilistic” targets – including floating-point
one-hot encoded targets – with a class dimension, or integer class
labels, without a class dimension. If you one-hot encode your integer
class labels – and convert to floating-point – the two versions will
agree.)

However, using integer class labels is modestly more efficient (even
if you start with one-hot encoded labels and convert them to integer
class labels), so there’s no reason to use one-hot encoded labels
with cross_entropy() (nor with CrossEntropyLoss).

Best.

K. Frank

1 Like