Which Loss function for One Hot Encoded labels

jda · November 18, 2018, 11:29pm

I am trying to build a feed forward network classifier that outputs 1 of 5 classes. Before I was using using Cross entropy loss function with label encoding. However, I read that label encoding might not be a good idea since the model might assign a hierarchal ordering to the labels. So I am thinking about changing to One Hot Encoded labels. I’ve also read that Cross Entropy Loss is not ideal for one hot encodings. What other loss functions can I look into in this case? Also since a new column will be created for each class, how do I “collect” the labels and “feed” them to the proposed loss function?

For example, right now I have:

criterion = nn.CrossEntropyLoss()
.
.
.
loss = criterion(ypred_var, labels)

ptrblck · November 18, 2018, 11:50pm

nn.CrossEntropyLoss should be a good fit in your case.
You are not really encoding the labels in a hierarchical order as this would be the case using nn.MSELoss.
Instead you provide the class indices, which will be used to get the current class probability. So there is mathematically no difference between this approach and using one-hot encoded tensors.

That being said, nn.CrossEntropyLoss expects class indices and does not take one-hot encoded tensors as target labels.
If you really need to use it for some other reasons, you would probably use .scatter_ to create your one-hot encoded targets.

jda · November 19, 2018, 1:27am

Ok so to be clear, if the five classes have labels 0 - 4 after label encoding then this is the class index and it is compatible with nn.CrossEntropyLoss that I am currently using?

ptrblck · November 19, 2018, 1:39am

Yes, that’s correct! You should make sure the target is stored as a torch.LongTensor.

shellyondiet · March 9, 2021, 6:47am

hey, what if i really need to use one-hot encoded targets, should i define my own nn.CrossEntropyLoss ? since it is expecting label indices. Can you elaborate on what using .scatter_ means?

ptrblck · March 9, 2021, 7:15am

Yes, you could write your custom loss function, which could accept one-hot encoded targets.
The scatter_ method can be used to create the targets or alternatively use F.one_hot:

nb_classes = 3
target = torch.randint(0, nb_classes, (10,))
one_hot_scatter = torch.zeros(10, nb_classes).scatter_(
    dim=1, index=target.unsqueeze(1), src=torch.ones(10, nb_classes))

one_hot = F.one_hot(target, num_classes=nb_classes)

shellyondiet · March 10, 2021, 2:28am

I have tried building my own loss function for one hot encoding labels as below :

log_prob = torch.nn.functional.log_softmax(logits_q, dim=1)
loss= -torch.sum(log_prob)*y_qry[i]

but it seems to have error as

RuntimeError: grad can be implicitly created only for scalar outputs

there is usage of

grad = torch.autograd.grad(loss, params)

what may be wrong with the code?

ptrblck · March 10, 2021, 7:37am

The loss.backward() operation would implicitly create a gradient as torch.ones(1), if the loss is a scalar value.
Otherwise, if the loss is a tensor with multiple values, you would either have to provide the gradient explicitly or reduce the loss before, e.g. via loss.mean().backward().