Add sigmoid layer

Hello,
im using an model pretrained i need to add classifier layer but i don’t understand how:
1- it’s Linear layer apply softmax automatically ?
2- can i use Linear layer and after this layer add a Softmax layer ?:

model = models.video.mc3_18(pretrained=True, progress=True)
        set_parameter_requires_grad(model, feature_extract)
        #change output layer FC
        model.fc = nn.Sequential(nn.Linear(512, 256),
                                 nn.ReLU(),
                                 nn.Linear(256,num_classes),
                                nn.Softmax())

3 - it’s ok to use crossentropy loss ?
thank you :slight_smile:

  1. Its Linear layer applies softmax automatically ?

No, a Linear layer just applies the weights and biases for the inputs.

  1. Can i use Linear layer and after this layer add a Softmax layer ?

It is best that you don’t, because it may give you a problem when calculating the loss. You could set a condition in your model so it does the softmax operation if it is not training:

if model.training:
    return output
else:
    return torch.nn.functional.softmax(output)
  1. Is it ok to use crossentropy loss ?

In order to train, you will need to give the raw output from nn.Linear(256, num_classes) to the Cross Entropy loss object, since the loss implementation applies LogSoftmax to calculate the loss: nn.CrossEntropyLoss

Thank you @Manuel_Alejandro_Dia,
it is possible to use this head ? :

model.fc = nn.Sequential(nn.Linear(512, 256),
                                 nn.Softmax())

Glad I could help! :grin:

You cannot use that head since it has a Softmax layer.

I would recommend something like:

model.fc = nn.Sequential(nn.Linear(512, 256),
                         nn.ReLU(),
                         nn.Linear(256,num_classes),
                         nn.ReLU()
                         )

Rmember that you don’t have to add the softmax operation in your nn.Sequential since it could interfere with the softmax done on CrossEntropyLoss

1 Like

Relu() generate probability?, can i use max of probability like predicted class ?

What ReLU does is filter the negative values. What you can do if you just want to get the predicted class is to apply output.argmax(1) (or max, as you said)