I’m using torch.nn.functional.cross_entropy with one-hot vectors as target when I realized that it wants class probabilities or class indices as target, not one-hot vector.
But my thought is, isn’t it the same? I mean, if I use one-hot vector as FloatTensor is like setting 100% probability on a class and 0% on the others?
Indeed, with this simple example:
# One-hot to FloatTensor
t = torch.tensor([[0.0,1.0],[1.0,0.0]])
print(F.cross_entropy(logit,t))
# Class indices
t = torch.tensor([1,0], dtype=torch.int64)
print(F.cross_entropy(logit,t))
Can I have some problem with the first approach? Is it correct?
Thank you!
You understand this correctly, and there is nothing wrong with the
first approach. (To be clear, cross_entropy() accepts targets that
are either floating-point “probabilistic” targets – including floating-point
one-hot encoded targets – with a class dimension, or integer class
labels, without a class dimension. If you one-hot encode your integer
class labels – and convert to floating-point – the two versions will
agree.)
However, using integer class labels is modestly more efficient (even
if you start with one-hot encoded labels and convert them to integer
class labels), so there’s no reason to use one-hot encoded labels
with cross_entropy() (nor with CrossEntropyLoss).