Functional.cross_entropy with one-hot vector

Jacopo_Rizzi · August 9, 2024, 2:07pm

I’m using torch.nn.functional.cross_entropy with one-hot vectors as target when I realized that it wants class probabilities or class indices as target, not one-hot vector.
But my thought is, isn’t it the same? I mean, if I use one-hot vector as FloatTensor is like setting 100% probability on a class and 0% on the others?
Indeed, with this simple example:

logits = torch.zeros([2,2], dtype=torch.float32)
logits[0] = torch.tensor([-0.1, 21.0])
logits[1] = torch.tensor([10.1, -2.0])

I get the same result with both approaches:

# One-hot to FloatTensor
t = torch.tensor([[0.0,1.0],[1.0,0.0]])
print(F.cross_entropy(logit,t))
# Class indices
t = torch.tensor([1,0], dtype=torch.int64)
print(F.cross_entropy(logit,t))

Can I have some problem with the first approach? Is it correct?
Thank you!

KFrank · August 9, 2024, 6:24pm

Hi Jacopo!

Jacopo_Rizzi:

I get the same result with both approaches:
# One-hot to FloatTensor
t = torch.tensor([[0.0,1.0],[1.0,0.0]])
print(F.cross_entropy(logit,t))
# Class indices
t = torch.tensor([1,0], dtype=torch.int64)
print(F.cross_entropy(logit,t))
Can I have some problem with the first approach? Is it correct?

You understand this correctly, and there is nothing wrong with the
first approach. (To be clear, cross_entropy() accepts targets that
are either floating-point “probabilistic” targets – including floating-point
one-hot encoded targets – with a class dimension, or integer class
labels, without a class dimension. If you one-hot encode your integer
class labels – and convert to floating-point – the two versions will
agree.)

However, using integer class labels is modestly more efficient (even
if you start with one-hot encoded labels and convert them to integer
class labels), so there’s no reason to use one-hot encoded labels
with cross_entropy() (nor with CrossEntropyLoss).

Best.

K. Frank