Can't get lower loss

Hello! I have a NN with a single linear layer, no activation function, just a matrix multiplication and bias (I need to do some tests and I came across this issue). So the input is 4D and the output is 2D and the relation between them is like this: [x,y,x+vx,y+vy] -> [x+2vx,y+2vy] so for example [10,20,40,100] -> [70,180]. My training data has no noise i.e. ideally, for a given input you should get the exactly right output and it can be shown that the matrix that the NN should learn, in the perfect case is this:
[[-1, 0, 2, 0],
[ 0, -1, 0, 2.]]
with a bias of zero for both output nodes. My training data has 512 examples and after the training, this is the matrix learnt by the NN:
[[-9.9997e-01, 2.6156e-05, 2.0000e+00, -2.6044e-05],
[ 2.6031e-05, -9.9996e-01, -2.5983e-05, 2.0000e+00]]
and the bias is:
[0.0003, 0.0003].
As you can see the result is very close to the right answer, but I am not sure why it doesn’t go lower than that. Given that I have a single linear layer, it should be just one minima, so the algorithm shouldn’t get stuck and given that I have no noise, the loss should simply go to zero, but it seems to get stuck somewhere around 1e-5. I am using adam optimizer. I do the training by starting with a LR of 1e-3 and I train until there is no significant improvement over several epoch, then I go to 1e-4 and so on until around 1e-8. No other tricks beside this, just the linear layer. Can someone tell me if this is normal? I am not sure what prevents the NN to get zero loss. Thank you!

Since the weight and bias are almost perfectly fit, the loss value should be quite small at the end of the training. If you lower the learning rate further (to 1e-8), the parameter updates will be very small and might even get lost due to floating point precision.

You could try to increase the learning rate a bit or skip some reductions and see, if that would lower the loss. However, since your model is almost perfectly fit, I’m not sure if it’s worth a lot of effort to get the “perfect” numbers.