How to improve my model which predicts well but for the wrong reason?

Hi everyone,

I’m trying to develop a pytorch model to classify some synthetic images. The model should identify whether an image contains out-of-place bolts or not (in the attachment there are two examples of images from both classes). I got high accuracy and low loss (98% on the validation set). But if I look on the activation map I see that the model is predicting well but for the wrong reason.

I trained my model for 30 epochs (I saved the weights of the epoch with the best accuracy) with an Adam optimizer and a lr of 0.01 and used batches of 10 images.


Do you know how it could be possible to improve my model pushing it to give more weight to the out-of-place bolts?

This is the activation map:

If you nevertheless get high validation accuracy what is the issue? I am not sure if an activation map can tell you a lot about how the network works? What exactly is your activation map showing?

If you think your networks learned the wrong parts, then maybe you can proprocess the image to only show the region of interest? As Low validation error means your network can classify your validation set very good, are you sure your validation images are correct?

thanks for your reply. My activation map simply generates an heat map according to weights of the last layer of my resnet50 and then it is overlayed on the image. So, basically I would expect a red area around the out-of-place bolt.

Regarding my validation set, I selected similar images belonging to both classes (‘clean’, ‘with bolts’). I tried different shuffling and i obtained different results ( very few cases with low accuracy) but in this way I have a model which is dependent more on the ‘fortune’ of having the right images .

This is a screenshot of my training performances:

I assume you then have a layer with two outputs that classify your image? I think your resnet output does not need to match locally to the regions of your image.

You just overfitted your model. Get more data with more different bolt positions and viewing angles. And use a substantial amount of different images for validation (>1k). As you have synthetic images, I assume you can generate more? How do your true/false pairs look like? Do you use the exactly same scene with and without a bolt? Also are both classes equaly distributed in your training and validation data?

The classifier is made as follow:

resnet.fc = nn.Sequential(
             nn.Linear(resnet.fc.in_features, 512),
             nn.Linear(512, 2)

I have a lot of synthetic images with more view angles and bolts position. I cannot generate more but I ha lots of data to play with. The only problem is that maybe it is difficult to retrieve images with the exactly same background with and without bolts.

Right now I’m proceeding by little steps. Adding some similar images at time and see if something happens. Consider that I trained the model with a training set composed by 250 images and a validation set of 150 images.

Do you mean in relation to the dataset or to the specific batch?

250 images are by any means not enough. Your model is heavily overfitting. You need at minimum 1k+ images but better would be 10k-100k.

I think this would be great for the network to learn however.

thnak you! I will try with more images