Where in the model to place torch.nn.Softplus or torch.softmax

I am wondering where to place functions like nn.Softplus or torch.Softmax in the speicfic models exactly and why.

In model-descriptions I saw that usually people apply these kind of functions at the end of a model (after the last Linear-Layer.)

For my problems it seems that it doesn’t work.
I get an output from my vgg_feature_extractor; extract some rois (region of interests) and send these to my two models.

Model A: Classifier. (one single Linear-Layer; nothing else).
with Loss: criterion_classification = nn.CrossEntropyLoss()

Model B: Distance_regressor (three Linear-Layers; nothing else).
with loss: criterion_distance = nn.SmoothL1Loss(reduction='mean')

If I add the Softmax to the end of model A and train it: The train_acc stucks at 0,57x (not moving anymore).

Right code seems to be (only classifier)…

        output_class_pred = classifier(rois_out)

        # 2. Calculate  and accumulate loss
        loss = loss_fn(output_class_pred, class_label_dl)
        train_loss += loss.item()

        # 3. Optimizer zero grad
        optimizer.zero_grad()

        # 4. Loss backward
        loss.backward()

        # 5. Optimizer step
        optimizer.step()

        y_pred_class_no = torch.argmax((torch.softmax(output_class_pred, dim=1)), dim=1)
        train_acc += (y_pred_class_no == class_label_dl).sum().item()/len(output_class_pred)

    # Adjust metrics to get average loss and accuracy per batch
    train_loss = train_loss / len(dataloader)
    train_acc = train_acc / len(dataloader)
  • (with Linear-Layer is the output of modelA in this scenario)

If I do it this way like quoted it seems to work. After a couple of epochs I get 87–89 % train_acc (with 9 classes).

So I am wondering why some people add the Softmax to the end of their model?
Or where I should place it correctly …

For my distance_regressor I am still not sure where to place the Softplus (which I want to use to only get positive distance_values).

If placed at the end of modelB (the distance_regressor) –and before the loss_dist_fn it seems that the predicted_distance remains 0 (since the models_output is negative.).
– But which seems for me more logic, since I want to get positive values only.

If I place it after the loss_dist_fn the values seems to quicker adjust to positive value-predictions.

Any hint where the two functions shall be usually placed? (end of model, or after loss_fn, …)

This is usually wrong, as nn.CrossEntropyLoss expects raw logits as the model output and nn.NLLLoss expects log-probabilties (so F.log_softmax would be needed) for multi-class classification use cases.
This also explains why your training is stuck when the softmax activation is added.

Thanks a lot!!

  1. So where shall I place the Softplus for modelB (regression) and the nn.SmoothL1Loss()?

And by the way:
2) Is there a smooth way to exclude a prediction_output –of a specific class– from the distance_regression loss_calculation?
So the loss for the distance_regressor should be only calculated for class: 0–7, but not for class 8.