Outputs from a simple DNN are always the same whatever the input is

Raine · March 15, 2018, 2:47pm

I have built a DNN with only one hidden layer, the following are the parameters:
input_size = 100
hidden_size = 20
output_size = 2
def init():
self.linear1 = nn.Linear()
self.linear2 = nn.Linear()
def forward():
x1 = F.leaky_relu()
return F.leaky_relu()
#unimportant codes omitted
loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.02)

normalized word vectors of size 100 from authoritative github are used as input
My purpose is to identify whether a word is an event. For example, ‘dought’ is an event but ‘dog’ is not.
After training, the 2-dimensional output tensors are almost the same (say,(-0.8,-1.20) and (-0.8,-1.21), (-0.2,-1.01) and (-0.2,-1.02)) even if the activation function and loss function are changed.
Could someone tell me the reason? I tried my best but failed to solve it.

ptrblck · March 15, 2018, 4:03pm

Could you check the weight and bias in both layers?
Sometimes, e.g. when the learning rate is too high, the model just learns the “mean prediction”, i.e. the bias is responsible for most of the prediction, while the weights and input became more or less useless.
For example when I was playing with a facial keypoint dataset, some models just predicted the “mean position” of the keypoints, regardless of the input image.

jpeg729 · March 15, 2018, 8:31pm

This code shouldn’t even run because F.leaky_relu() needs an input.

ptrblck · March 15, 2018, 8:52pm

I thought that’s what “unimportant code omitted” meant

Raine · March 16, 2018, 1:26am

Thank you! I set the bias to 0 and the problem is solved!!
unfortunately this method did not work~~~~lol

yuqli · February 13, 2019, 3:45pm

Could you please kindly elaborate more on the bias and “mean prediction” part? I’ve seen this explanation multiple times on the Internet but cannot get it. When the learning rate is too high, my understanding is that the model wouldn’t converge? Why would that results in bias responsible for most of the prediction? Thanks!

ptrblck · February 13, 2019, 4:00pm

I’m not sure if there is an underlying mathematical explanation for this effect.
In the past I experienced that basically the bias in the last layer took the mean values of the regression task, so regardless of the input, I always got the average of my targets.
Could be an edge case and I don’t have a proper explanation for it.

yuqli · February 15, 2019, 5:00am

That’s probably what happened to my model too, except I did not check that carefully for bias values but I do notice bias getting dominant in terms of scale with regard to weights. The outputs are also indeed the mean.

I solved this by 1) normalizing the input by demean and divide by std and 2) used a smaller learning rate.

Thanks for the prompt reply!

Mahsa_Afsharizade · July 8, 2019, 10:26am

This problem may be due to the “batch normalization”. When you are evaluating your model, you should disable batch normalization. Use “model.eval()” when you want to evaluate the model (so batch normalization will be disabled) and use “model.train()” again when you want train the model.

Hong_Cheng · January 14, 2020, 6:00am

This means that if we do the validation during training with model.eval(), in the main training loop, before we use optimizer.step(), we should add model.train() ?

ptrblck · January 14, 2020, 7:26am

You should call model.train() before the forward pass in your training loop.
If you call it before optimizer.step(), the forward pass will have been already executed in eval mode.

db111 · May 30, 2021, 3:12pm

Do you just set the bias=false while initialisation or is there any other way to set the bias equal to 0 for the linear layer?

ptrblck · May 30, 2021, 11:19pm

If you set bias=False during the initialization of the layer, the internal .bias parameter will be set to None and will thus not be available, which would be different from setting the value of the bias to zero.
The latter case can be achieved by manipulating this parameter e.g. via:

with torch.no_grad():
    model.linear_layer.bias.fill_(0.)

db111 · May 31, 2021, 6:53am

Thanks!! I’ll try it out.