Does my loss curve show the model is overfitting?

Cyber_punk · July 12, 2023, 2:13pm

The following is my loss curve. It includes the training batch loss and validation loss.

I also checked the F1 Score and AUROC on the validation data. Both of them seems performing well:

However I’m confused about two things:

(1) Both training loss and validation loss are decreasing, but validation loss is higher than training loss consistently, while validation loss never increases. It seems I can continue the training even at the end (100k step). Is this still considerred overfitting? If early stopping is implemented, stopping will not be triggered yet since ther validation loss is still decreasing.

(2) The training loss drops sharply at the beginning of each epoch. What is the reason behind this? Do I need to worry about it? I have shuffled the training data in DataLoader. In the model, the last layer is an vector inner product layer. When replacing this layer with a block of 3 linear layers with leakyReLU activation functions, and keeping everything else the same, the sharp drop in training loss disappears. In this way, I think the sharp drop in training loss is caused by the vector inner product layer. But why?

vdw · July 12, 2023, 5:58pm

Overfitting is generally indicated by the training loss decreasing further, but with the validation loss starting to increase at some point (e.g., after some epochs). So no, I wouldn’t call this overfitting.

Why would a sharp drop in the training loss a thing of concern. If you have a large training dataset with a “simple” distribution – silly example: 1 Million images with 50% red circles and %50 blue squares – the the model will have learned everything there is too learn after 1 epoch, I’m sure :).

dhruvbird · July 12, 2023, 6:17pm

I would personally not consider this to be overfitting, but it is somewhat concerning that your train and val accuracy is so far apart. This could happen due to a variety of non-exhaustive reasons. A common one is probably some sort of bias when you chose your train and val split and you have some different distribution of data in both.

I’m not too worried about the train and val loss - these can be apart since you’re computing the train loss on a mini-batch with gradient updates whereas val loss is computed w/o gradient updates. I would run through the training set and val sets once w/o backward (with the model in eval mode) and check if the losses line up closer than what you’re seeing here.

KFrank · July 14, 2023, 12:06am

Hi Cyber!

This is not overfitting (by my definition) and the general structure of
your results does not surprise me.

Your validation loss continues to decrease (albeit slowly) and your
validation performance metrics continue to increase. So (as long as
your are happy investing in further training time) keep training to get
a better model.

The gap between your training loss and validation loss is quite typical
of realistic models.

Here’s, crudely, what’s going on:

Your training set encodes two kinds of information: information about
the real problem you’re trying to solve (say cats vs. dogs), e.g., these
cats have intelligent expressions on their faces and these dogs are
scratching fleas; and “random” information specific to your training set,
e.g., this cat picture has a chessboard in the background and this dog
picture has a cluster of dark pixels in the upper right corner.

When you train on your training set, your model learns whatever it
can to get the right answer: both “looks smart,” cat, “fleas,” dog; and
“chessboard,” cat, “dark pixels,” dog help your model make good
predictions on your training set. But only the first two “features” help
your model make good predictions on your validation set. Since your
model makes use of both kinds of information with the training set,
it does a better job – lower loss – on the training set than on the
validation where is can only make use of the first kind of information.

How does overfitting happen? It’s when the model trains its parameters
to focus on “chessboard” vs. “pixels” at the expense of “smart” vs. “fleas.”
It gets better and better and better at the “random,” training-set-specific
information – so the training loss goes down, but it makes increasingly
little use of the “real” information, so the validation loss goes up. The
more you train, the worse your model becomes (expect for the artificial
case of the training set).

So: validation loss keeps going down – keep training (if you care); validation
loss starts going up (systematically – not just little fluctuations) – stop
training because you’re making your model worse.

I don’t know – but it might be interesting and fun to try to figure it out.

The short answer is no – your validation metrics look fine, so your model
is working.

But something is going on. It might be innocuous, but it could be a real
issue that is perhaps making your training less efficient. If it were me, I
would try to track down the cause of the training-loss drops, but I wouldn’t
knock myself about it.

I don’t know how an inner-product layer works, so no idea why. However,
given that you can use your validation set to see how well your model is
actually doing, I would use whatever version of your model ends up
giving you better performance on your validation set after training. If the
training has these unexpected drops in it, so be it (but I’d still spend at
least a little time trying to track down the cause).

Best.

K. Frank