I am a basic question.

Should `softmax`

be applied after or before Loss calculation. I have seen many threads discussing the same topic about `Softmax`

and `CrossEntropy Loss`

. But my question is in general, i.e. regarding using `Softmax`

with any loss function. So

- Is it a rule of thumb that softmax if used, it should only be used before ( or after) loss calculation.
- If it is not a rule of thumb, which gives better results. Applying before or After Loss calculation.

Like in the below code, Should the `Softmax`

be applied at **Line 1** (or) **Line 2**

```
_softmax = torch.nn.Softmax(dim = 1)
for epoch in range(num_epochs):
for phase in ['train', 'val']:
if phase == 'train':
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
num_data = 0
num_corrects = 0
_loss = []
for i, data in enumerate(tqdm(dataloader['phase'])):
# Every data instance is an input + label pair
inputs, true_labels = data
# Zero your gradients for every batch!
optimizer.zero_grad()
with torch.set_grad_enabled(phase == 'train'):
# Make predictions for each batch
predictions = model(inputs)
Line 1: ------- predictions = _softmax(predictions) ---------------
# Compute the loss for each batch
loss = loss_fn(predictions, labels)
Line 2: ------- predictions = _softmax(predictions) ----------------
# Calculate predicted labels for each batch
_, pred_labels = torch.max(predictions.data, 1)
if phase == 'train':
#Compute loss gradients
loss.backward()
# Adjust learning weights
optimizer.step()
_loss.append(loss.item())
# Calculate samples for each epoch
num_data += true_labels.size(0)
# Calculate number of correctly predicted labels for each epoch
num_corrects += torch.sum(pred_labels == true_labels.data.to(device))
epoch_loss = numpy.mean(_loss)
epoch_accuracy = 100 * num_corrects / num_data
```

(Tell me if there is any step wrong while calculating `epoch_loss`

or `epoch_accuracy`

)