Cross Entropy Loss get predicted class

Er_Hall · October 14, 2019, 8:14pm

Hi all,

I am using in my multiclass text classification problem the cross entropy loss. But I have been confused. My targets are in [0, c-1] format. How can I obtain the predicted class? An example will be helpful, since cross entropy loss is using softmax why I don’t take probabilities as output with sum =1?

ptrblck · October 14, 2019, 8:34pm

nn.CrossEntropyLoss expects logits, as internally F.log_softmax and nn.NLLLoss will be used.
If you want to get the predicted class, you could simply use torch.argmax:

output = model(input)
pred = torch.argmax(output, dim=1)

I assume dim1 is representing the classes. If not, you should change the dim argument.

Er_Hall · October 14, 2019, 8:46pm

Can you explain me why this implementation for multiclass text classification doesn’t use sigmoid since expects logits?

github.com

keishinkickback/Pytorch-RNN-text-classification/blob/master/model.py

import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence


class RNN(nn.Module):

    def __init__(self, vocab_size, embed_size, num_output, rnn_model='LSTM', use_last=True, embedding_tensor=None,
                 padding_index=0, hidden_size=64, num_layers=1, batch_first=True):
        """

        Args:
            vocab_size: vocab size
            embed_size: embedding size
            num_output: number of output (classes)
            rnn_model:  LSTM or GRU
            use_last:  bool
            embedding_tensor:
            padding_index:
            hidden_size: hidden size of rnn module

This file has been truncated. show original

ptrblck · October 14, 2019, 8:49pm

sigmoid would convert each output to a probability in the range [0, 1].
Logits on the other side are unbound ([-inf, inf]), so you should not apply any activation function on your model outputs.

Er_Hall · October 14, 2019, 8:54pm

So if I have 5 output classes and 3 test instances and for example I will take the below output:
tensor([[ 0.4657, -0.7640, -1.4268, -0.5012, 1.2167],
[-0.4578, -0.5621, -0.5652, -0.4056, 0.2509],
[-0.5617, 0.8141, -0.1722, -0.1264, 0.2285]]

this means that for the first instance the right class is the 5th? for the second also the 5h and of the last instance the 2nd? (the biggest ones) ??

ptrblck · October 14, 2019, 8:56pm

The mentioned classes indices might not be the right ones (this is determined by the target), but the predicted ones (with the highest probability).

If you want to get probability values, you could use F.softmax to get values in the range [0, 1].
However do not pass these values to the criterion. Use them just for debugging/printing purposes.

Er_Hall · October 14, 2019, 8:58pm

So the positions I mentioned above is the “predicted classes”, right?

Thank you very much for your replies, I really appreciate it!

ptrblck · October 14, 2019, 9:02pm

Yes, that’s correct. The highest logit (in this setup) gives you the predicted class.
That’s also why you can call torch.argmax directly on the logits without applying softmax, as this won’t change the predicted classes (max in logits will still be the max value after softmax).

R_S · September 17, 2021, 7:32am

i used torch.argmax before training the model to convert one hot encoded data ( b, class, h, w) =(16, 12, 256, 256) to (16, 256, 256). Now i can use the cross entropy loss function. Is it the right way to train a model for segmentation task?

ptrblck · September 17, 2021, 7:34am

Yes, as explained here.