Model overfitting - validation not improving

I’m facing an overfitting problem, my model get very high accuracy on the training set ~99.8% accuracy
while on the validation set i’m getting worse result ~41% accuracy.
I tried adding more data (30K more images to the training set and validation set) and tried also data augmentation but the model just not improving on the validation set.

Any idea why?


You could add some regularization like Dropout or weight decay. Did data augmentation help at all?
What kind of images do you have?

Did you make sure to preprocess the training and validation images in the same way (normalizing, etc.)?

Thanks for your reply.
I added Dropout just before the last fc layer which didn’t help much, i’m training resnet50 on my dataset, my dataset contains images 600x600 of jewellery images.
data augmentation didn’t help and plus adding more data didn’t help.
i made sure that i’m using the same preprocessing on my training and my validation images.
not sure what else can i do, and why it’s not converging on my validation set

Look like sth wrong is in your pipeline. If you have > 50k images, over-fitting with Dropout should not be a big problem.
Try pretrained model and learn just last layer (so dont optimize rest of them, just pass model.fc.parameters() to optimizer)

If this also doesn’t help, maybe make again random split between training and validation images (maybe distribution of images in this two sets is too different)

Last part is training without Data-Augmentation, maybe turn it off and see what will happend.

Hi @ptrblck and @melgor,
Here is an example of what is happening to me during training and then evaluating the model.

I made sure that i set model.eval() when evaluating.

I’m out of ideas

Do you get these bad results for the validation set from the beginning or is it improving a bit?

Could you post your Datasets with all pre-processing code?
How was the data split into training and validation? Was it a random split or was some logic involved?

i get these bad result after a few epochs when the training accuracy becomes ~78%
the data was randomly split into training and validation using sklearn.model_selection.train_test_split method.

This is my Dataset's code with all the pre processing:

class ImageAdapter(metaclass=ABCMeta):
	def get_image(self, index):

	def get_y(self, index):

	def __len__(self):

class FileSystemAdapter(ImageAdapter):
	"""docstring for FileSystemAdapter"""
	def __init__(self, videos_path, dataset_json_filename, width=600, height=600):
		super(FileSystemAdapter, self).__init__()
		self.width = width
		self.height = height
		self.dataset, _ = self.load_dataset(dataset_json_filename)

	def load_dataset(self, dataset_json_filename):
		with open(dataset_json_filename, 'r') as fp:
			data = json.load(fp)
			return data["dataset"], None

	def crop_center(self, img,cropx,cropy):
		y,x,c = img.shape
		startx = x//2 - cropx//2
		starty = y//2 - cropy//2    
		return img[starty:starty+cropy, startx:startx+cropx, :]

	def load_img(self, path):
		img =
		if img.mode != 'RGB':
			img = img.convert('RGB')
		return img

	def img_to_array(self, img):
		return np.asarray(img, dtype=np.float32)/255.

	def image_loader(self, path):
		img = self.load_img(path)
		img_arr = self.img_to_array(img)
		img_arr = self.crop_center(img_arr, self.width, self.height)
		return img_arr

	def get_y(self, index):
		return self.dataset[index]['target']

	def get_image(self, idx):
		image = self.dataset[idx]['image']
		return self.image_loader(image)

	def __len__(self):
		return len(self.dataset)

class MyDataset(data.Dataset):
    def __init__(self, adapter, spatial_transform=None, normalization=(None, None)):
        self.adapter = adapter
        self.spatial_transform = spatial_transform
        self.mean, self.std = normalization

    def get_y(self, index):
        return self.adapter.get_y(index)

    def __getitem__(self, index):
        image = self.adapter.get_image(index)
        target = self.adapter.get_y(index)

        #normalize image
        if self.mean is not None and self.std is not None:
            image = (image - self.mean)/self.std

        #perform spatial transform
        if self.spatial_transform is not None:
            image = self.spatial_transform(image)

        return image, target

    def __len__(self):
        return len(self.adapter)

Before the training accuracy reaches ~78% the validation accuracy is rising?
Are the validation predictions biased towards some classes the whole time?

The code looks fine, as long as the normalization is performed in the same way for the training and validation data.
Could you check the data stats (mean, std, min, max) for some samples taken from both DataLoaders?

Yup, the validation accuracy is rising and then it starting to drop giving me huge loss.
at the beginning when the validation accuracy rising the predictions are spread on the whole classes but when
the training accuracy rises the validation predictions are biased towards 1-2 classes (not the same class every epoch - for example in epoch 12 the validation predictions are biased towards classes 1 and 2, and on epoch 13 they are biased towards classes 3 and 7)
I checked my data stats, mean, std, min and max and the training stats are very similar to the validation stats that’s why i’m worried.
I even tried to do inference on my training set and it gave me bad results.
i’m training some of the batchnorm layers in resnet50 but my batch size isn’t small, maybe it has something to do with this?

Based on the screenshot you shared, your training accuracy and confusion matrix looks good.
What do you mean by “tried to do inference on my training set and it gave me bad results”?
Did you switch to eval and the results were also bad on the training set?

If your batch size isn’t small, it’s not problematic to use BatchNorm layers. The opposite could be problematic.

i switched the model to eval() and used my train data loader to test the model on the training set to see if my problem is occurring on the training set also (got bad results on my training set) - which it did so i guess it’s not a normalization problem (which i thought at the beginning)

i’m using batch_size=64 for the training and for the eval phase

I even made sure that the layers weights are not updating in the eval mode.

Did you change the BatchNorm layers somehow (the momentum etc.)?
What happens if you freeze the BatchNorm layers?

i did not make any change in the BatchNorm layers.
I tried freezing all the BatchNorm layers and train the model - same results
I even tried to freeze the BatchNorm layers and do a transfer learning - same results

I might be shooting in the dark but maybe it has something to do with this:

1 Like

You could definitely try the suggestions from this thread!
Let me know, if it helps somehow.


I have same problem, and freezing the BN does not change anything.

I am doing a simple classification task in medical imaging area. I have a dataset about 50k images and 4 class. The model overfitting a lot (below are the training for 4 epochs)

Validation accuracy before training is 0.28779987373737376

Epoch: 0
Training accuracy: 0.7547950186346696, Validation accuracy: 0.4102746212121212
Training loss: 0.5928439186062924, Validation loss: 1.9086839990182356

Epoch: 1
Training accuracy: 0.9022588855558585, Validation accuracy: 0.38691603535353536
Training loss: 0.25842620344117806, Validation loss: 2.5381281658856554

Epoch: 2
Training accuracy: 0.939960003636033, Validation accuracy: 0.36900252525252525
Training loss: 0.16145963321418264, Validation loss: 2.9992803214776393

so basically after the first iteration, it is overfitting. What should I do? I have tried everything I know from dropout, augmentation, increasing dataset, using WeightedSampler, weight decay.

I really do not understand how after the first epoch it is overfitting heavily. I have tested models from MobileNet to Inception. Nothing works.

There are two main questions for me:

  1. Why adding more data is not improving the result?
  2. Why it is overfitting only after first epoch?
  1. Using more data should improve the ability to generalize. I would start by checking e.g. the preprocessing pipeline and make sure that e.g. the normalization is applied in the training and validation use case. A quick test might be to use the training dataset in the validation loop and check it’s performance.

  2. I’m unsure if the model is only overfitting after the first epoch, since epoch0 already shows a worse validation accuracy than the training one.

Thanks for your reply.

I have tested this one, and accuracy for testing training dataset is near 90%, so I can say everything is fine regarding the preprocessing methods.

I really do not get how the model is acting like this. If it is not overfitting, what is happnening here? Any suggestion for improving this case?

Also when I plot the confusion matrix, I see that most of the data are biased toward first class. What should I do for removing this bias? Add weight balancing to loss or try OnevsRest?