I am wondering where to place functions like nn.Softplus
or torch.Softmax
in the speicfic models exactly and why.
In model-descriptions I saw that usually people apply these kind of functions at the end of a model (after the last Linear
-Layer.)
For my problems it seems that it doesn’t work.
I get an output from my vgg_feature_extractor
; extract some roi
s (region of interests) and send these to my two models.
Model A: Classifier. (one single Linear
-Layer; nothing else).
with Loss: criterion_classification = nn.CrossEntropyLoss()
Model B: Distance_regressor (three Linear
-Layers; nothing else).
with loss: criterion_distance = nn.SmoothL1Loss(reduction='mean')
If I add the Softmax
to the end of model A and train it: The train_acc
stucks at 0,57x (not moving anymore).
Right code seems to be (only classifier)…
output_class_pred = classifier(rois_out)
# 2. Calculate and accumulate loss
loss = loss_fn(output_class_pred, class_label_dl)
train_loss += loss.item()
# 3. Optimizer zero grad
optimizer.zero_grad()
# 4. Loss backward
loss.backward()
# 5. Optimizer step
optimizer.step()
y_pred_class_no = torch.argmax((torch.softmax(output_class_pred, dim=1)), dim=1)
train_acc += (y_pred_class_no == class_label_dl).sum().item()/len(output_class_pred)
# Adjust metrics to get average loss and accuracy per batch
train_loss = train_loss / len(dataloader)
train_acc = train_acc / len(dataloader)
- (with
Linear
-Layer is the output ofmodelA
in this scenario)
If I do it this way like quoted it seems to work. After a couple of epochs I get 87–89 % train_acc (with 9 classes).
So I am wondering why some people add the Softmax
to the end of their model
?
Or where I should place it correctly …
For my distance_regressor
I am still not sure where to place the Softplus
(which I want to use to only get positive distance_values).
If placed at the end of modelB
(the distance_regressor
) –and before the loss_dist_fn
it seems that the predicted_distance
remains 0 (since the models_output
is negative.).
– But which seems for me more logic, since I want to get positive values only.
If I place it after the loss_dist_fn
the values seems to quicker adjust to positive value-predictions.
Any hint where the two functions shall be usually placed? (end of model, or after loss_fn, …)