Softmax + Cross-Entropy Loss

vsant · June 29, 2021, 4:54pm

Hello,

My network has Softmax activation plus a Cross-Entropy loss, which some refer to Categorical Cross-Entropy loss. See:

In binary classification, do I need one-hot encoding to work in a network like this in PyTorch? I am using Integer Encoding. Just as matter of fact, here are some outputs WITHOUT Softmax activation (batch = 4):

outputs: tensor([[ 0.2439, 0.0890],
[ 0.2258, 0.1119],
[-0.2149, 0.2282],
[ 0.0222, -0.1259]]

And here are some outputs WITH Softmax (Softmax activation before Cross-Entropy):

outputs: tensor([[0.3662, 0.6338],
[0.4209, 0.5791],
[0.4611, 0.5389],
[0.5497, 0.4503]]

As expected, the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1. Honestly, I see no remarkable in the loss functions in both situations.

But, the whole point is to know whether I can use integer encoding with Softmax + Cross-Entropy in PyTorch.

Thanks.

Epoching · June 29, 2021, 5:15pm

Do keep in mind that CrossEntropyLoss does a softmax for you. (It’s actually a LogSoftmax + NLLLoss combined into one function, see CrossEntropyLoss — PyTorch 1.9.0 documentation). Doing a Softmax activation before cross entropy is like doing it twice, which can cause the values to start to balance each other out as so:

As for your question, are you saying there’s no point in using softmax since your outputs without it are already between 0-1? I mean technically I guess you don’t, but using an activation function is a nice way to ensure bounds in case your network spits out something unexpected

vsant · June 29, 2021, 7:53pm

Hello,

Thank you for your response. Yes, I do know that Cross-EntropyLoss has a softmax “embedded”. Without the Softmax, the outputs are not necessarily between 0 an 1. See below:

outputs: tensor([[ 0.2439, 0.0890],
[ 0.2258, 0.1119],
[-0.2149, 0.2282],
[ 0.0222, -0.1259]]

The piece of code related to this post is like that:

def train_model(model, criterion, ...)
...
   outputs = model(inputs)
   _, preds = torch.max(outputs, 1)  
   loss = criterion(outputs, labels)
...


# main code

my_model = my_model.to(device)
my_criterion = nn.CrossEntropyLoss()
...
my_model = train_model(my_model, my_criterion,...)

Above, I use integer encoding. I am just wondering whether I can use integer encoding with Softmax + Cross-Entropy in PyTorch. The point is that some authors, by using other frameworks rather than PyTorch, state that we MUST use one-hot encoding for binary classification, because, eventually, we may have all outputs equal. For instance, see this Stack Overflow post (Keras): python - keras CNN same output - Stack Overflow

Hence, the explanation here is the incompatibility between the softmax as output activation and binary_crossentropy as loss function. To solve this, we must rely on one-hot encoding otherwise we will get all outputs equal (this is what I read). But I used Cross-Entropy here. In my case, as shown above, the outputs are not equal. Hence, it seems there is no problem in using integer encoding in PyTorch in a situation like that. But, I would like to be sure.

Thanks.

Epoching · June 29, 2021, 8:30pm

Oh sorry I misunderstood what you said. Yea from my experience I always use Integer Encoding (e.g. [2, 3, 1, 1] vs [[0 0 1 0], [0 0 0 1], [0 1 0 0], [0 1 0 0] for CrossEntropyLoss and BCELoss

The docs for CrossEntropyLoss and BCELoss / BCEWithLogitsLoss both show examples using integer encoding the labels rather than one-hot encoding

The predictions that go into these loss functions are just the raw logits from the model

vsant · June 29, 2021, 8:42pm

Thank you again for your reply.

Epoching · June 29, 2021, 8:45pm

No problem! I learned something new too haha