# `Softmax` before (or) after Loss calculation

I am a basic question.
Should `softmax` be applied after or before Loss calculation. I have seen many threads discussing the same topic about `Softmax` and `CrossEntropy Loss`. But my question is in general, i.e. regarding using `Softmax` with any loss function. So

1. Is it a rule of thumb that softmax if used, it should only be used before ( or after) loss calculation.
2. If it is not a rule of thumb, which gives better results. Applying before or After Loss calculation.

Like in the below code, Should the `Softmax` be applied at Line 1 (or) Line 2

``````_softmax = torch.nn.Softmax(dim = 1)
for epoch in range(num_epochs):
for phase in ['train', 'val']:
if phase == 'train':
model.train()  # Set model to training mode
else:
model.eval()   # Set model to evaluate mode
num_data = 0
num_corrects = 0
_loss = []
# Every data instance is an input + label pair
inputs, true_labels = data

# Make predictions for each batch
predictions = model(inputs)

Line 1: ------- predictions = _softmax(predictions) ---------------
# Compute the loss for each batch
loss = loss_fn(predictions, labels)
Line 2: ------- predictions = _softmax(predictions) ----------------

# Calculate predicted labels for each batch
_, pred_labels = torch.max(predictions.data, 1)

if phase == 'train':
loss.backward()
optimizer.step()

_loss.append(loss.item())

# Calculate samples for each epoch
num_data += true_labels.size(0)

# Calculate number of correctly predicted labels for each epoch
num_corrects += torch.sum(pred_labels == true_labels.data.to(device))
epoch_loss = numpy.mean(_loss)
epoch_accuracy = 100 * num_corrects / num_data
``````

(Tell me if there is any step wrong while calculating `epoch_loss` or `epoch_accuracy`)

1. It depends on the loss function and where it defines the softmax operation. The issue with `F.softmax` and `nn.CrossEntropyLoss` in PyTorch is that `nn.CrossEntropyLoss` applies `F.log_softmax` internally and thus no previous `F.softmax` operation should be used. It’s thus not the user’s choice if and where to use the softmax, but the loss function definition.
2. Same as before: applying the unnecessary `F.softmax` before the internal `F.log_softmax` of `nn.CrossEntropyLoss` is wrong. Some users reported beneficial results which could have been also achieved by e.g. lowering the learning rate.
1 Like

Thanks @ptrblck. You are the best. Clear cut answer.

Sure, happy to help!
One addition: you are of course totally free to still apply `F.softmax` on the model’s output in case you want to use the probabilities in some debugging step to visualize these. You should however not pass these probabilities to e.g. `nn.CrossEntropyLoss`.

1 Like