What are possibilities that output of linear layer before sigmoid layer return near opposite value?

Hi you guys, pardon me I’m not good at training model. I’m training the model with architecture as following

``````        out = self.conv2_dw(out)
out = self.conv_23(out)
out = self.conv_3(out)
out = self.conv_34(out)
out = self.conv_4(out)
out = self.conv_45(out)
out = self.conv_5(out)
out = self.conv_6_sep(out)
out = self.conv_6_dw(out)
out = self.conv_6_flatten(out)
if self.embedding_size != 512:
out = self.linear(out)
out = self.bn(out)
out = self.drop(out)
out = self.prob(out)
print("self.prob", out)
return out
``````

Given self.prob = Linear layer, model gradually converged to good result.
I stopped at 25 epochs due to keep training continue I fear that it tend to overfit.

why when I run test, output at self.prob is near opposite value

``````self.prob tensor([[ 4.2354, -4.1672]], device='cuda:0')
sigmoid =>  [[0.9857328  0.01525886]]
``````

Any answer is highly appreciated. Many thanks

I don’t completely understand the question.

The output in the test case is opposite to which value?

It means I’m training a model to classify image is fake or real.
My last layer of model is nn.Linear before I fit it to softmax.
I’m training and looking like the model converged well due to loss, acc decrease gradually.
In test pharse, I print output after linear function as above image.
The output is always ~ U[-k, k] distribution (like above ‘’’self.prob tensor([[ 4.2354, -4.1672]] ‘’’)
I’m new in training model, so If my question is tough to understand, pardon me for that. I’m always highly appreciated your help.

Thanks for the clarification.
If I understand the question correctly, you are wondering why the output values of the last linear take values such as `[4.2354, -4.1672]`?
The last linear layer outputs logits, which are raw prediction values and are not bounded to a specific range.
The lower the value, the lower the probability and vice versa.
In your case your model is pretty confident that the current sample belongs to class0. You can see the probabilities using `softmax` (which you’ve already done) and see that the class0 probability would correspond to ~98.6%.

Thanks, I have understood this issue. I am also wondering that if the output ain’t force to U[-k, k] distribution, whether it is better or not ?

The output is not forced to a specific distribution. I’m not sure which use case should be better or worse. Could you explain what you mean?

Look like nn.Linear is initialized by U[-k, k], so its output is always forced to U[-k, k]. I don’t know it is good or not.
However seem as it’s how initialize weights good for model. I only ask about your experiences about how to initialize weights for linear layers.

The initialization isn’t creating a hard bound on the output values, as the range of the input values also determines the output. However, of course it has a certain influence on the output.

I don’t know what the currently most popular weight init method for linear layers is and would try to stick to good performing models and their init methods.

1 Like

thanks for your sharing and supports for pytorch community. Keep it up

1 Like