Overfitting model

Hi All,

I have a SWIN transformer model that is overfitting the data (Increasing training accuracy and stable validation accuracy) despite using some measures to mitigate it. Such as weight decay, batch-norm and dropout. Each class has around 2500 images. Dataset has 8 classes. Batch size is 32.

Model details:

model = models.swin_v2_b(weights='IMAGENET1K_V1')

num_classes = len(image_datasets['train'].classes)

#model.head = nn.Linear(model.head.in_features, num_classes)

num_classes = model.head.in_features
model.head = nn.Sequential(
    nn.Linear(num_classes, 512),  # Add a new fully connected layer
    nn.BatchNorm1d(512),  # Batch normalization
    nn.Dropout(0.5),       # Dropout with a 50% probability
    nn.Linear(512, 8)       # Output layer

Would anyone be able to help me in this regards please.
Thanks & Best Regards

Hi Michael!

I would not necessarily say that your model has started to overfit yet. To me,
overfitting has set in when your validation performance metrics start getting
worse, even as your training performance metrics continue to improve.

Consider training significantly longer to check whether your validation
performance metrics might just have hit a temporary plateau and might
start improving again as you continue to train.

I would also recommend looking at your validation loss (and training loss)
in addition to your validation accuracy. You would like to see their behavior
confirm one another. For example, it might be the case that your validation
loss is still going down (perhaps slowly), even as your validation accuracy
is going sideways (is “stable”). That would be a sign that you are not yet

Also, you haven’t given us any sense of how long you have trained for.

Consider also using data augmentation to help delay overfitting.

How many samples are in your training and validation datasets?

You haven’t said whether you are only fine-tuning the new model.head or
whether you are also continuing to train the pretrained weights.

You can also sometimes reduce overfitting by reducing the “capacity” of your
model (basically the number of parameters), generally by making the model
“shallower” or “narrower.” Applying this just to model.head, you might consider
narrowing model.head by reducing the number of its “hidden neurons” to 128
or 64 (from the current value of 512).


K. Frank

Thanks for the reply. The validation set has around 250 images per class. What I am trying to do currently, is to use it as a fixed feature extractor.

Is it possible to do both at the same time?

I will change the hidden neurons and check.

Please find the attached graph of training and testing (should be validation set could not change the labels) losses and accuracy. For a better understanding of the issue.

Thanks & Best Regards

Hi Michael!

At first glance, it does look like your model is overfitting – the training loss
is going down while the validation loss is going up, and so on.

However, on closer inspection, the graphs look fishy to me.

You add a new, randomly-initialized head to your model. Therefore, before
any training or fine-tuning, your model should be making random guesses,
regardless of how well your pre-trained “backbone” matches your use case.

With an eight-class classification problem (with about the same number of
data samples in each class), random guesses will produce an accuracy of

In your accuracy graph, your initial training accuracy (this is presumably the
value after either no training or one epoch of training, depending on when you
take your accuracy snapshot) is over 70%, which seems surprisingly high.
Furthermore, your validation accuracy, although not as high, is still something
like 45%, which is also much better than random guessing.

Your loss values also look fishy. Your validation loss starts out worse than
your training loss, and immediately goes up. Even if your graphs start after
the first epoch of training, it seems highly unlikely that your model (specifically
its head) could have overfit after a single epoch of training (during which it
would have seen each of the approximately 18,000 training images only once).

The results just don’t look sensible.

Again, depending on how your code is organized, the initial points on your
graphs might be after training for an epoch. If so, what are your training
and validation losses and accuracies before training (that is, immediately
after installing your randomly-initialized head)? At this point, your training
and validation losses should be nearly equal and your training and validation
accuracies should both be approximately 12.5%.

Last, things like BatchNorm and Dropout behave differently in train
and eval mode. To make sure that you are making an apples-to-apples
comparison, it probably makes sense to plot training and validation losses
and accuracies all computed with your model running in eval mode.


K. Frank

Hi, Thanks for the reply. You are right this graph is for a network that does not have any measures in place for overfitting. Does this mean anything? In the sense that the validation accuracy could be made better? Please find the attached images of the graphs for the network which with the measures for overfitting.

Thanks & Best Regards

Hi Michael!

I don’t know what you mean by this.

In general I don’t understand what you are doing here. You’ve posted some
graphs that look different than they did before, but with no explanation of
what you changed. (I hope that you’re not just blindly changing things until
the results “look good.”)

Have you figured out why your previous graphs looked so fishy?

Have you figured out why – even in your current graphs – the initial values
for your training and validation losses and accuracies differ from one another?

Why don’t your accuracies start at the 12.5% one would expect for random

Ignoring the above-noted issues, it looks like your training has plateaued, but
that overfitting has not set in. Twenty epochs is not a lot, so I would recommend
training more. Sometimes training “gets stuck” on a temporary plateau, but gets
off of it with additional training.

If your goal is to achieve a higher validation accuracy, I would suggest training
until overfitting has clearly set in. If your validation accuracy is then still not as
high as you would like, consider various techniques for delaying overfitting,
including data augmentation and reducing the capacity of your model.


K. Frank

Hi, Thanks for the reply. The second graph is the correct graph for this question. The first graph image was from a NN that did not have any measures (WeightDecay, Dropout, BatchNorm) in place in case of overfitting.
Thanks & Best Regards