Gate Classification Model Overfitting

The model is meant to classify an image as a gate or not a gate. When I say gate, I’m referring to gates at the intersection of a railroad and street. The model is being trained on a set of images where some of the images include a gate and others don’t. The goal is for the model to correctly determine whether a given image contains a gate or not. The model is being trained on images from before the street was repaved and is being tested on images from after the street was repaved. I can’t add images from after it was repaved to the training set. I’d like the model to remain accurate regardless of any changes that happen to the background(meaning behind the gate in the images). I’ve been trying to tinker with data augmentation techniques to overcome any variations in the background, specifically as it relates to color(because the repaved street is darker than before it was paved). The model was previously using the AutoAugment policy on SVHN, ImageNet, and CIFAR10. ChatGPT thinks this is excessive and that I should only use ImageNet or CIFAR10. I tried that and there were very marginal improvements. Then I added the below color transformations on the training set and the model became a lot more volatile:

auto_transform = transforms.Compose([
 transforms.Resize((256, 256)),
 transforms.RandomCrop((224, 224)),
 # transforms.RandomGrayscale(p=0.8),  
 #transforms.AutoAugment(transforms.AutoAugmentPolicy.CIFAR10),
 transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.2),
 transforms.RandomGrayscale(p=0.1),
 transforms.RandomAdjustSharpness(sharpness_factor=2, p=0.5),
 transforms.RandomAutocontrast(p=0.5),
 transforms.RandomAdjustSharpness(sharpness_factor=2, p=0.5), 
 #transforms.AutoAugment(transforms.AutoAugmentPolicy.IMAGENET),
 #transforms.AutoAugment(transforms.AutoAugmentPolicy.SVHN),
  transforms.ToTensor(),
  transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

What else can I do/change to fix this problem? Any insight is greatly appreciated!

What is the class distribution in your dataset?

Have you tried viewing the augment you’re applying? If the training is only volatile when using augments, you may want to tone down any augment that is excessive.

1 Like

Class distribution:

Training set class distribution: {'1': 1061, 'neg': 1669}
Testing set class distribution: {'1': 241, 'neg': 113}

‘1’ refers to images that contain a gate. ‘neg’ refers to images that do not contain a gate

I haven’t tried viewing the augment at all. I’m not sure exactly what I want the augment to look like. I just want the model to be able to ignore any changes to the background, whether it be the street being repaved or anything else.

Also, if I have multiple augments that are randomly applied, wouldn’t that complicate the viewing process? There are so many possible variations of which random augments apply to a given image.

Is data augmentation not the best solution to this problem?

No, I think augmentation is the right direction. But if we consider human eyesight as a “perfect” baseline, we should be able to detect if any augmentations are being counterproductive by viewing, say, 10-20 examples per augment.

For example, the random crop could be cutting off parts of the image necessary for the task. Or perhaps a certain significant percent of the time, the color jitter could be making the gate relatively invisible.

It would be sufficient enough to just test a sampling of each augment visually one by one to see if any is excessive. And fairly quick to do.

When you say the training results are volatile, what is the accuracy without augmentation vs. with?

It seems this may have been an oversight, doubling up on the “RandomAdjustSharpness”.

1 Like