What happens when loss are negative?

Hello Brandon!

This isn’t true. All common optimization algorithms I’m aware
of – and in particular, gradient descent – only care about the
gradient of the loss, and not the loss itself.

Plain-vanilla gradient descent takes the following optimization
step:

new weights = old weights - learning rate * gradient

Could you have misread “learning rate” for “loss” at some point?

Gradient descent (and, again, all common algorithms that I am
aware of) seek to minimize the loss, and don’t care whether that
minimum value is a large positive value, a value close to zero,
exactly zero, or a large negative value. It simply seeks to drive
the loss to a smaller (that is, algebraically more negative) value.

You could replace your loss with

modified loss = conventional loss - 2 * Pi

and you should get the exact same training results and model
performance (except that all values of your loss will be shifted
down by 2 * Pi).

It is the case that we often use loss functions that become equal
to zero when the fit of the model to the training data is perfect,
but the optimization algorithms don’t care about this, and they
drive the loss function to algebraically more negative values,
and not towards zero.

Good luck!

K. Frank

16 Likes