I can not understand, is the error 0.4 and -0.4 the same?
Indeed, in the first case, the gradient needs to be reduced, and in the second one, should it be increased?
But if I do
abs (Loss)
then the network will not correctly change the gradient …