[HELP] output layer with softmax in pytorch

this is my network

class ResNetWithBottleneck(nn.Module):
    def __init__(self, dim_bottleneck_layer, num_class):
        super(ResNetWithBottleneck, self).__init__()
        # base pretrained resnet
        self.base_resnet = ResNetRvLinear()
        self.base_resnet = make_pretrained_resnet50(self.base_resnet)
        # add fc_bottleneck layer
        self.dim_bottleneck_layer = dim_bottleneck_layer
        self.fc_bottleneck = nn.Linear(2048, dim_bottleneck_layer)

        self.fc_class = nn.Linear(dim_bottleneck_layer, num_class)
        self.softmax = nn.Softmax(dim=1)

        # init weight

    def forward(self, x):
        x = self.base_resnet(x)
        x = self.fc_bottleneck(x)
        y = self.fc_class(x)
        y = self.softmax(y)
        return x, y

when I remove function forward last y = self.softmax(y), the code runining well.

when I add the y = self.softmax(y), the loss not descent.

and I don’t know what’s the problem with me.

btw. I want know why torchvision’s net both not use softmax in the last output layer.



I see some information and get Loss function will calculate softmax, when you add a addtion softmax will lower loss descent(but loss will still descent).

I review some pytorch code.

in yunjey/ pytorch-tutorial’s GAN code

D = nn.Sequential(
    nn.Linear(image_size, hidden_size),
    nn.Linear(hidden_size, hidden_size),
    nn.Linear(hidden_size, 1),

criterion = nn.BCELoss()

why when predict a binary class should add a sigmoid funciton?


in line 48 when predict

Hi, for the first question, which loss function do you use?
For the second question:
For binary classification, the logistic function (a sigmoid) and softmax will perform equally well, but the logistic function is mathematically simpler and hence the natural choice. When you have more than two classes, you can’t use a scalar function like the logistic function as you need more than one output to know the probabilities for all the classes, hence you use softmax.

thanks @Xiaoyu_Song !

in my first question, I do a multi-class prediction, and I use a softmax function as my output layer, and the loss descent big slower(like not descent) than not use softmax, my loss function use CrossEntropyLoss.

and then I figure out that:
in nn.BCELoss, latest layer need a sigmoid function.
in nn.BCEWithLogitsLoss, not need a sigmoid funtion in latest layer.
in nn.CrossEntropyLoss, not need a nn.Softmax(dim=1) in latest layer, because the loss funtion already include softmax function.

Today I’m doing the CNN multi-class prediction, and I wan to output the probability about every class, but in pytorch , the nn.CrossEntropyLoss contains a log_softmax(),and the nn.NLLLoss
function also need log_softmax() in the last layer ,so maybe there is no loss funtion for softmax.
But I can train the model as usual with using
and the last layer is just a nn.Linear() layer,
At last ,when I want to get the softmax probability, I can use like this :
probability= torch.nn.functional.softmax(out_put,dim=1)
Now the probability is same as what you get in tensorflow or keras softmax


Hi, I have a related question. I try to train a two-stream network, so I need a fusion of two softmax layers before giving to loss function. But the nn.CrossEntropyLoss already include softmax function. What should I do to prevent the nn.CrossEntropyLoss from automatically using the softmax function?

hi,I also have a related question,I want to apply my model on mobilephone,so I have to convert it to another frame(like NCNN or MNN) . I want softmax to be the last layer. I’ll try train the model without softmax and add softmax layer when saving it. After all,there is no weights in softmax layer.