How to return output values only from 0 to 1?

FTdiscovery · September 5, 2018, 1:59am

Given a convolutional neural network:


class ConvNet(nn.Module):
    def __init__(self, num_classes):
        self.numClasses = num_classes
        super(ChessConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 12, kernel_size=5, stride=1, padding=2),  # 1, 64
            nn.BatchNorm2d(12),
            nn.ReLU())
        self.layer2 = nn.Sequential(
            nn.Conv2d(12, 12, kernel_size=5, stride=1, padding=2),  # 64, 12
            nn.BatchNorm2d(12),  # 12
            nn.ReLU())
        self.layer3 = nn.Sequential(
            nn.Conv2d(12, 12, kernel_size=3, stride=1, padding=1),  # 12, 12
            nn.BatchNorm2d(12),
            nn.ReLU())
        self.layer4 = nn.Sequential(
            nn.Conv2d(12, 12, kernel_size=3, stride=1, padding=1),  # 12, 12
            nn.BatchNorm2d(12),
            nn.ReLU())
        self.fc = nn.Linear(896 * 12, num_classes)

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)

        return out

Is there any way to have the neural network output values from 0 to 1 as a result of the forward function? I have tried a few of the following:

added a sigmoid activation to the final layer (nn.Sigmoid(out)), but this does not solve the problem, as the network is unable to train under this circumstance.
I have changed the neural network to train on output values scaled by a logit function, then adding a sigmoid activation function once I get the required output values. Again, the network is unable to train under this circumstance.

[additional information:

Training uses PoissonNLLLoss and can accurately classify objects. However, the network needs to output probabilities between 0 and 1 (instead of the current range from ~ -60 - 3). The probabilities of each class in the output array are independent to each other, so a softmax layer will not work.

An example output by the NN at the moment:
[2.4, -53.12, 0.53, -3.59]

what an output should look like:
[0.32, 0.00, 0.58, 0.92]

]

Thank you!

krishnamurthy · September 6, 2018, 2:10pm

Your best bet would be to use nn.CrossEntropyLoss and experiment with various initialization schemes.

Deeply · September 7, 2018, 1:51pm

you only need to use the Sigmoid activation after the network converged, i.e. in the testing stage.

ShiweiJin · November 30, 2020, 7:13pm

Hello. I am a little bit confused. If I want to scale the output to (0, 1), I will add a Sigmoid to the output during both training and testing stages. But basically, we should add this activation function only in the testing stage instead of both the training and testing stages. Am I right?

Deeply · December 2, 2020, 10:35am

You can do that after each training epoch/iteration, but I believe that shouldn’t be part of the training itself. That is, suppose you would like to measure the performance of the model on the training data after a few epochs, then you can add the Sigmoid to scale the output. You should not back-propagate through the Sigmoid, as training will be severely deterministic. Best way is to try both approaches and see what happens, ie to add or not to add the Sigmoid during training.

Update: I am assuming the use of a loss function that embeds an activation function, like the Softmax loss.

What the Sigmoid does is a kind of squashing operation.

ShiweiJin · December 8, 2020, 8:21am

Thank you for your response. Sorry for my naive thoughts. I am still quite don’t understand well.
For example, if my ground truth is in [0, 1]. After the final fully connected layer, I would like to scale the output into [0, 1]. So I add a Sigmoid function after FC layer. And I will compare the prediction with the ground truth and based on the loss to do the backpropagation to update the network.
If during the training, I only use the final FC layer to output the prediction and the network fits well. However, during the testing stage, I add a Sigmoid to squash the FC output. Maybe in the end the original performance is good but with the Sigmoid function, the performance might not be as good as the original backpropagation one?
Thank you again for your response. Maybe my idea and understanding are naive and trivial.

Deeply · December 8, 2020, 12:09pm

If you are using Softmax Loss, which is actually just a Softmax Activation plus a Cross-Entropy Loss, then the activation function is part of the loss during training. Your output will still be linear if you are using nn.linear at the output layer. Now, to be honest, I am not sure if you’ll get [0, 1] output even if your training is between [0, 1]. Best way is to print out the output values after your model converges, and if they are not bounded between [0, 1], then, use the Softmax (not Sigmoid) to resolve make bound them between [0, 1]. Hence, since the Softmax loss already implements the Softmax, the model behavior will be the same during training and testing.
NB. Maybe I was not clear in my previous reply as I mentioned you should not back-propagate through the Sigmoid, as I was subconsciously assuming that the Softomax (or sigmoid) activation is already part of the loss function. Sorry for the confusion.