Outputs from a simple DNN are always the same whatever the input is

I have built a DNN with only one hidden layer, the following are the parameters:
input_size = 100
hidden_size = 20
output_size = 2
def init():
self.linear1 = nn.Linear()
self.linear2 = nn.Linear()
def forward():
x1 = F.leaky_relu()
return F.leaky_relu()
#unimportant codes omitted
loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.02)

normalized word vectors of size 100 from authoritative github are used as input
My purpose is to identify whether a word is an event. For example, ‘dought’ is an event but ‘dog’ is not.
After training, the 2-dimensional output tensors are almost the same (say,(-0.8,-1.20) and (-0.8,-1.21), (-0.2,-1.01) and (-0.2,-1.02)) even if the activation function and loss function are changed.
Could someone tell me the reason? I tried my best but failed to solve it.

Could you check the weight and bias in both layers?
Sometimes, e.g. when the learning rate is too high, the model just learns the “mean prediction”, i.e. the bias is responsible for most of the prediction, while the weights and input became more or less useless.
For example when I was playing with a facial keypoint dataset, some models just predicted the “mean position” of the keypoints, regardless of the input image.

6 Likes

This code shouldn’t even run because F.leaky_relu() needs an input.

I thought that’s what “unimportant code omitted” meant :smile:

Thank you! I set the bias to 0 and the problem is solved!!
unfortunately this method did not work~~~~lol

Could you please kindly elaborate more on the bias and “mean prediction” part? I’ve seen this explanation multiple times on the Internet but cannot get it. When the learning rate is too high, my understanding is that the model wouldn’t converge? Why would that results in bias responsible for most of the prediction? Thanks!

I’m not sure if there is an underlying mathematical explanation for this effect.
In the past I experienced that basically the bias in the last layer took the mean values of the regression task, so regardless of the input, I always got the average of my targets.
Could be an edge case and I don’t have a proper explanation for it. :wink:

4 Likes

That’s probably what happened to my model too, except I did not check that carefully for bias values but I do notice bias getting dominant in terms of scale with regard to weights. The outputs are also indeed the mean.

I solved this by 1) normalizing the input by demean and divide by std and 2) used a smaller learning rate.

Thanks for the prompt reply!

This problem may be due to the “batch normalization”. When you are evaluating your model, you should disable batch normalization. Use “model.eval()” when you want to evaluate the model (so batch normalization will be disabled) and use “model.train()” again when you want train the model.

This means that if we do the validation during training with model.eval(), in the main training loop, before we use optimizer.step(), we should add model.train() ?

You should call model.train() before the forward pass in your training loop.
If you call it before optimizer.step(), the forward pass will have been already executed in eval mode.

Do you just set the bias=false while initialisation or is there any other way to set the bias equal to 0 for the linear layer?

If you set bias=False during the initialization of the layer, the internal .bias parameter will be set to None and will thus not be available, which would be different from setting the value of the bias to zero.
The latter case can be achieved by manipulating this parameter e.g. via:

with torch.no_grad():
    model.linear_layer.bias.fill_(0.)
1 Like

Thanks!! I’ll try it out.