CNN stagnating before epoch 1 (but a little differently)

I have a CNN acting as a regressor to predict a value between 0-1 for images where 1 represents a nice ordered disk looking object and 0 represents almost random disorderly moving object.

I have 1,525,760 images to train on but I’m restricting the number of samples selected per epoch to be 23840. This number was chosen because the batch size is 64 and it means that one new epoch under this sampling method is only 1/64th of a true epoch.

To do this I’m using WeightedRandomSampler with the weights of each image set equally to 1. In this respect, I believe that the CNN will draw purely random samples to make each new batch.

The reason I’m doing this is to observe how the data is fitting while seeing the images for the first time, as in tests with ordinary epochs I found the validation set to stagnate from epoch 1 onwards.

It looks like the validation loss drops and then stagnates before it hits the end of epoch one when using the new sampling system as seen below:


Please note that the Y-axis is MSE loss.

Perhaps the model is solving part of the regression problem (maybe the k=1 end which has lower diversity in image appearance) and then struggling on after that?

My question is:
Is there a way to improve the accuracy of the CNN before the stagnating validation accuracy.

Things to consider:
I don’t think this is classic overfitting as the CNN hasn’t seen all of the data by epoch 0.4 at which point the validation and training accuracy diverge.

I’ve tried using dropout in case it was overfitting but all that does is shift the training and validation curves higher in MSE loss.

My model:

class RegressionalNet(torch.nn.Module):
    def __init__(self):
        self.feature_extractor = torch.nn.Sequential(
        self.classifier = torch.nn.Sequential(
    def forward(self,x):
        features = self.feature_extractor(x)
        output = self.classifier(features.view(int(x.size()[0]),-1))
        return output

I am using:
Adam optimizer with initial learning rate = 1e-3

Any help would be greatly appreciated!

if your hypothesis is that the CNN solves the class 1 problem because it is simpler, you can verify this by seeing loss or error of each class individually.

That’s a good idea, thanks. Do you think it is more informative that for the validation accuracy rather than the training?

Not really sure but if i were to guess:
train falls to 0 but validation does not --> classic overfitting
train doesn’t fall to 0 but validation does --> validation is a subset of training, possible data leakage
both fall to 0 —> the case you are saying that one class is solved easily while the other is not.

I’m trying to implement the binned errors so I’ll get back to you on that once it’s working :slight_smile:

Could you explain your idea using the WeightedRandomSampler a bit?
I’m not sure I understand it properly.

It seems you are restricting the number of samples per epoch using the Dataset's __len__ method?
If you use WeightedRandomSampler with equal weights and replacement=True, you might get some duplicated samples.
If you just want to iterate your dataset, you don’t have to use a special sampler, as the default one will just use each sample once.

Of course,

So I’m using WeightedRandomSampler with replacement = False and setting the weights for each image in the dataset to equal 1. This way the DataLoader will select all the images over one epoch but given that I can restrict the num_samples, I can observe how the CNN performs on a test set before it’s seen all the data.

This was the only way that I could restrict the number of samples because I couldn’t find a num_samples option in DataLoader.

Does this make sense?

I have implemented the binned validation and it looks like the network struggles with predicting low values of k on the face of it.

The whole thing really stagnates before 1 epoch (that’s epoch’ 64 on the plot below) but strangely there’s some good fitting to say k=0.3 too so not entirely sure why it under-performs on some low k images but not others. To me this would suggest a problem with the dataset no? Or an under representation of the test set?

can you get the training curves as well? if the training is doing well on all groups then there is an under-representation of that group in the test set . The whole idea is difference in trends between train and test.
Also you do have equal number of samples from all groups?

I can do that yes, it will take a few days though just a fair warning. I don’t have an equal number of samples because as k approaches zero the diversity in images increases exponentially, so the number of samples with k<0.6 is heavily over populated whereas the number of samples for say k~0.9 is less because the network has been shown to predict these images well in unseen data.

In retrospect, I don’t know if I can do this for training as the only way I could do it with validation was by splitting my validation set up into groups and running them through the network group by group… but the train loader randomly selects the data and so isolating the groups within a batch and then re-putting them through the network and cataloguing the accuracy seems very tricky

Looking at the plot, I’m noticing some anti correlation between predictions for classes below and above k~0.5.

It makes improving the global test accuracy difficult as high k labelled images only improve in test score when the low k labelled images drop in test scoring.

This is undoubtedly because of the imbalanced dataset but is there a way to stop this happening without collecting more data? As the frequency of occurrences of low k values images is very very low.

I wonder if anyone has any advice on dealing with regression problems where the diversity of images is very high in a class who’s frequency of occurrence is very low?