Hello! I have a NN with a single linear layer, no activation function, just a matrix multiplication and bias (I need to do some tests and I came across this issue). So the input is 4D and the output is 2D and the relation between them is like this: [x,y,x+vx,y+vy] -> [x+2vx,y+2vy] so for example [10,20,40,100] -> [70,180]. My training data has no noise i.e. ideally, for a given input you should get the exactly right output and it can be shown that the matrix that the NN should learn, in the perfect case is this:

[[-1, 0, 2, 0],

[ 0, -1, 0, 2.]]

with a bias of zero for both output nodes. My training data has 512 examples and after the training, this is the matrix learnt by the NN:

[[-9.9997e-01, 2.6156e-05, 2.0000e+00, -2.6044e-05],

[ 2.6031e-05, -9.9996e-01, -2.5983e-05, 2.0000e+00]]

and the bias is:

[0.0003, 0.0003].

As you can see the result is very close to the right answer, but I am not sure why it doesn’t go lower than that. Given that I have a single linear layer, it should be just one minima, so the algorithm shouldn’t get stuck and given that I have no noise, the loss should simply go to zero, but it seems to get stuck somewhere around 1e-5. I am using adam optimizer. I do the training by starting with a LR of 1e-3 and I train until there is no significant improvement over several epoch, then I go to 1e-4 and so on until around 1e-8. No other tricks beside this, just the linear layer. Can someone tell me if this is normal? I am not sure what prevents the NN to get zero loss. Thank you!

# Can't get lower loss

Since the `weight`

and `bias`

are almost perfectly fit, the loss value should be quite small at the end of the training. If you lower the learning rate further (to `1e-8`

), the parameter updates will be very small and might even get lost due to floating point precision.

You could try to increase the learning rate a bit or skip some reductions and see, if that would lower the loss. However, since your model is almost perfectly fit, I’m not sure if it’s worth a lot of effort to get the “perfect” numbers.