How to Prevent Overfitting

How do you prevent overfitting when your dataset is not that large? My dataset consists of 110 classes, with a total dataset size of about 20k images.

I have tried data augmentation by a factor of about 16x, but it does not help too much with overfitting.

Right now, with my augmented dataset, at epoch 8, I am getting a testset Top1 accuracy of 45% but a trainset Top1 accuracy of 69%.

Some suggestions:

  • add weight decay.
  • reduce the size of your network.
  • initialize the first few layers your network with pre-trained weights from imagenet.

Right, I switched from using a pretrained (on Imagenet) Resnet50 to a Resnet18, and that lowered the overfitting, so that my trainset Top1 accuracy is now around 58% (down from 69%).

Would increasing data augmentation rate (from 16x to say 96x) decrease overfitting further?

One other thing about the nature of my dataset-there is a severe class size imbalance as well across the 110 classes-could this be contributing to the overfitting as well? What would be a good solution to this then-perhaps clustering the smaller classes into aggregate classes to match the size of the larger classes? Then perhaps a second net could be used where the aggregate classes are then broken down back into their original classes for more fine-grained classification.

Lastly, what would you say is a reasonable amount of overfitting in terms of the trainset-testset difference in accuracy measurement? 5%?


oh if you have class imbalance, use a weighted sampler, so that you see all classes with equal probability. That should help a lot (depending on the severity of your imbalance):

You can give this to your DataLoader


Thanks for the quick reply.

Ok, I will try that, but I am a bit confused by the documentation-how would I modify my dataloader below to add in the weightedRandomSampler?

testloader = data_utils.DataLoader(test_dataset, batch_size=20, shuffle=True)

Could you provide some guidance on usage, maybe with an example?

Thanks again

first off, you wouldn’t shuffle your testloader.

Here’s an example:

batch_size = 20
class_sample_count = [10, 1, 20, 3, 4] # dataset has 10 class-1 samples, 1 class-2 samples, etc.
weights = 1 / torch.Tensor(class_sample_count)
sampler =, batch_size)
trainloader = data_utils.DataLoader(train_dataset, batch_size = batch_size, shuffle=True, sampler = sampler)

Awesome, thanks!

Why should you not shuffle your testloader?

hi @smth, how to use weight sampler in trainloader automatically?
I mean we can count **class_sample_count ** in init step of dataset.
for example, in init function of ImageFolder(data.Dataset).

at test time, random shuffling does not affect model performance.
there is no need to do so.

Hmm, I tried the code above, but got the following error:


batch_size = 20
class_sample_count = [10, 5, 2, 1] 
weights = (1 / torch.Tensor(class_sample_count))
sampler =, batch_size)


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/", line 81, in __init__
    self.weights = torch.DoubleTensor(weights)
TypeError: torch.DoubleTensor constructor received an invalid combination of arguments - got (torch.FloatTensor), but expected one of:
 * no arguments
 * (int ...)
      didn't match because some of the arguments have invalid types: (torch.FloatTensor)
 * (torch.DoubleTensor viewed_tensor)
      didn't match because some of the arguments have invalid types: (torch.FloatTensor)
 * (torch.Size size)
      didn't match because some of the arguments have invalid types: (torch.FloatTensor)
 * (torch.DoubleStorage data)
      didn't match because some of the arguments have invalid types: (torch.FloatTensor)
 * (Sequence data)
      didn't match because some of the arguments have invalid types: (torch.FloatTensor)

before sampler =, add: weights = weights.double()


Ok, I added that, and tried running the code below, but am getting some odd/erratic results for Top1 accuracy for testset and trainset.

Testset Top1 accuracy tops off at around 20% and Trainset Top1 accuracy hits 100%-bizarre.

It also does not seem to train on all of the data for each epoch-it progresses to the next epoch quicker than expected.

When I take out the WeightedRandomSampler, I get normal results again.

Also, is it correct to use the trainloader with the sampler for obtaining the trainset accuracy?

ERROR PyTorch 0.1.10

You should strongly consider data augmentation in some meaningful way. If you’re attempting to do classification then think about what augmentations might add useful information and help distinguish classes in your dataset. In one of my cases, introducing background variation increased recognition rate by over 50%. Basically, with small datasets there is too much overfitting so you want the network to learn real-world distinctions vs. irrelevant artifacts like backgrounds / shadows etc.

Alternatively, as Andrew Ng said: “get more data” :slight_smile: <-- seems to be the cure to everything.


ERROR PyTorch 0.1.11

Yeah from what I understand, you will get better accuracy if your class sizes are close to equal. I think varying data augmentation could help or culling from the largest classes or using synthetic data to fill in some of the smaller sizes.

Shouldn’t the second argument of sampler be the total number of samples of training set? When I set it to batch_size, it only runs for one batch every epoch. @smth


I also had this issue-it seemed to only run for one batch every epoch. Did you find a way to correct this?

@nikmentenson I don’t know the correct way. The doc is too hard to understand. At least the num of examples should be the number of examples in total dataset.

Perhaps @apaszke can help?

Yes, I could not understand the documentation either.

I think the correct way to use WeightedRandomSampler in your case is to initialize weights such that

prob = [0.7, 0.3, 0.1] # probability of class 1 = 0.7, of 2 = 0.3 etc
# class[i] = list containing class present at index i in the dataset  
for index in len(dataset):
    reciprocal_weights[index] = prob[class[index]]

weights = (1 / torch.Tensor(reciprocal_weights))
sampler =, len(dataset))

I went through the sampler.WeightedRandomSampler source, and it just simply returns an iterator weighted with multinomial distribution ( with degree = len(weights)). Therefore to sample the entire dataset for one epoch and weigh your samples inversely to your class appearing probability, the weights should be as long as the size of the dataset, with each index having weight according to the class at that index.