Hi you guys, pardon me I’m not good at training model. I’m training the model with architecture as following
out = self.conv2_dw(out)
out = self.conv_23(out)
out = self.conv_3(out)
out = self.conv_34(out)
out = self.conv_4(out)
out = self.conv_45(out)
out = self.conv_5(out)
out = self.conv_6_sep(out)
out = self.conv_6_dw(out)
out = self.conv_6_flatten(out)
if self.embedding_size != 512:
out = self.linear(out)
out = self.bn(out)
out = self.drop(out)
out = self.prob(out)
print("self.prob", out)
return out
Given self.prob = Linear layer, model gradually converged to good result.
I stopped at 25 epochs due to keep training continue I fear that it tend to overfit.
Thanks for your reply.
It means I’m training a model to classify image is fake or real.
My last layer of model is nn.Linear before I fit it to softmax.
I’m training and looking like the model converged well due to loss, acc decrease gradually.
In test pharse, I print output after linear function as above image.
The output is always ~ U[-k, k] distribution (like above ‘’’self.prob tensor([[ 4.2354, -4.1672]] ‘’’)
I’m new in training model, so If my question is tough to understand, pardon me for that. I’m always highly appreciated your help.
Thanks for the clarification.
If I understand the question correctly, you are wondering why the output values of the last linear take values such as [4.2354, -4.1672]?
The last linear layer outputs logits, which are raw prediction values and are not bounded to a specific range.
The lower the value, the lower the probability and vice versa.
In your case your model is pretty confident that the current sample belongs to class0. You can see the probabilities using softmax (which you’ve already done) and see that the class0 probability would correspond to ~98.6%.
Look like nn.Linear is initialized by U[-k, k], so its output is always forced to U[-k, k]. I don’t know it is good or not.
However seem as it’s how initialize weights good for model. I only ask about your experiences about how to initialize weights for linear layers.
The initialization isn’t creating a hard bound on the output values, as the range of the input values also determines the output. However, of course it has a certain influence on the output.
I don’t know what the currently most popular weight init method for linear layers is and would try to stick to good performing models and their init methods.