Loss stuck for regression model

I’m training a model that returns 2 parameters. These two parameters are used for classical image processing:

  1. a threshold for the kirsch-operator
  2. the number of iterations for billateral filter.

The model trains using 300 representative images, along with both parameters that were manually determined.

  • I am currently using resnet18. A convolutional regression model.
  • The fully connected layer is changed to output 2 nodes.
  • As loss function I’ve chosen is the mean squared loss.
  • Reducelronplateau is used as a learning rate scheduler to minimize validation loss.

Unfortunately, my validation-loss is stuck. It settles inbetween 2000 and 3000.

Here are some of the things I tried:

  1. Experiment with different models, including resnet36 resnet50 vgg16 mobilenet.
  2. Resized and changed the batchsize.
  3. Multiple heads after the renset18 feature layer for both output and calculated loss, seperately for iteration and thresholding
  4. Ssingle output model for each threshold and iterations separately.
  5. Replaced RGB-images tried HSV-images.

I’d really appreaciate suggestions on how to succeed. Thank you very much.

No really clue but some suggestions:

  • Do you know that the task is feasible? Can you estimate those numbers your self just from the image?
  • Which kind of images are you using? 300 are very few images. Toy datasets (mnist or cifar) have already 60k images. Something more adequate could be using neural networks are feature extractors and then use classic tools like trees or regression models. Even training only some fully connected layers on top of that. However, retraining a res net seems overkilling it (and it’s probably overfitting)
  • networks are not good at all for predicting “numbers”. For example, classification problems are not trained to predict an ID rather but to generate a probability vector of all the possible values. Segmentation predicts per-class binary masks, landmark estimation predicts gaussian distributions per node etcetera… Also, predicting unbounded values makes the training more unstable as losses can vary a lot. if your network predicts 100 but the result was 1000 the MSE is so huge and the lr is not adjusted for such big gradients compared to predicting 100 and 150.

Hope it helps

1 Like

Hello Juan,

thank you for your elaboration - most appreciated!
Yes, the task is feasable. After some execising, I can estimate the numbers myself. Of course: there is always room for improvement after the first guess.

The images come from high-resolution cameras (mirror-cameras). All taken under water. The two parameters are supposed the adapted the processing methods to the visibility conditions.

Other than that, there is little variation in the content of the images. That’s why I believe that 300 (representative) images should be sufficient.

I will do some more testing based on your input.
Would it make sense to post the source-code?

Thank you once again for your response.

Adding the code here:

import cv2
from torch.utils.data import Dataset
import pandas as pd
import os
import glob
from tqdm import tqdm
from PIL import Image, ImageFilter
import random
import torch
import torch.nn as nn
from torch.nn import MSELoss
from torch.optim.lr_scheduler import ReduceLROnPlateau
from torch.utils.data import DataLoader, random_split
import torchvision.transforms as transforms
import numpy as np
import torchvision.models as models
# from torchsummary import summary
from itertools import product, combinations
from random import randint
import torch.nn.functional as F
extensions = [".jpg", ".jpeg", ".JPG"]

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device = 'cpu'

class CardDataset(Dataset):
    def __init__(self, root_dir, data_file_path, transform=None):
      self.Df = pd.read_csv(data_file_path)
      self.root_dir = root_dir
      self.transform = transform

    def __len__(self):
        return len(self.Df)

    def __getitem__(self, index):
        #img_path = os.path.join(self.root_dir, self.Df['Image Name'][index]+'.JPG')        
        #print('type img_path orig: ', type(img_path))
        #print('img_path orig: ', img_path)

        img_path = [f for f in glob.glob(os.path.join(self.root_dir, self.Df['Image Name'][index]+'*'), recursive=True) if os.path.splitext(f)[1] in extensions]
        img_path = str(img_path)[2:-2]
        #print('type img_path new: ', type(img_path))
        #print('img_path new: ', img_path)

        img = Image.open(img_path)
        y_label = torch.tensor([float(self.Df['Iterations'][index]), float(self.Df['Threshold'][index])])
        # y_label = torch.tensor( [float(self.Df['Threshold'][index])])
        # img = Image.new('RGB',(400,200))
        # img.paste(im1,(0,0))
        # img.paste(im2,(200,0))
        # label = y_label.item()
        # print(label)
        if self.transform is not None:
            img = self.transform(img)

            # print(img.size)

        return img, y_label

transform_train = transforms.Compose(
            transforms.Resize((1024, 1024)),
            # transforms.RandomRotation(180),
            # transforms.RandomPerspective(),            
            # transforms.ColorJitter(saturation=(0.8, 1.3), contrast=(0.8, 1.4), brightness=(0.8, 1.25)),
            # transforms.RandomPerspective(distortion_scale=0.6, p=1.0)
            # transforms.Resize(300),
            # transforms.Normalize(mean=[0.485, 0.456, 0.406],
            #                      std=[0.229, 0.224, 0.225]),

num_epochs = 600
learning_rate = 0.001
train_CNN = False
batch_size = 16
shuffle = True
pin_memory = True
num_workers = 2

train_set = CardDataset('all_days','all_days.csv',transform=transform_train)
train_set, validation_set = random_split(train_set, [int(0.85*len(train_set)), len(train_set)-int(0.85*len(train_set))])
#validation_set = CardDataset("hologram_classifier_dataset_splitted",'validation',transform=transform_val)
# train_set, validation_set = torch.utils.data.random_split(dataset,[train_size, val_size], generator=torch.Generator().manual_seed(2))
train_loader = DataLoader(dataset=train_set, shuffle=shuffle, batch_size=batch_size, num_workers=num_workers)
validation_loader = DataLoader(dataset=validation_set, shuffle=shuffle, batch_size=batch_size, num_workers=num_workers)

class parameter_model(torch.nn.Module):
  def __init__(self):
    self.Conv1 = torch.nn.Conv2d(3, 16, 3, padding = 'same')
    self.Conv2 = torch.nn.Conv2d(16, 32, 3, padding = 'same')
    self.Conv3 = torch.nn.Conv2d(32, 64, 3, padding = 'same')
    self.Conv4 = torch.nn.Conv2d(64, 128, 3, padding = 'same')
    self.pool = torch.nn.MaxPool2d(2,2)
    self.fc1 = torch.nn.Linear(in_features = 128, out_features = 64)
    self.fc2 = torch.nn.Linear(64, 2)
    self.adaptive_pool  = torch.nn.AdaptiveAvgPool2d(output_size=(1, 1))

  def forward(self, x):
    x = self.pool(F.relu(self.Conv1(x)))
    x = self.pool(F.relu(self.Conv2(x)))
    x = self.pool(F.relu(self.Conv3(x)))
    x = self.pool(F.relu(self.Conv4(x)))
    x = self.adaptive_pool(x)
    x = torch.flatten(x, 1)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    return x

# # ct =0
# # # childern = model.children()
# # # print(childern)
# # for child in model.children():
# #   ct += 1
# #   # print(child)
# #   if ct < 10:
# #       for param in child.parameters():
# #           param.requires_grad = False
# # # model.to(device)
# # print(ct)
model = parameter_model()
# print(model)
# from torch.nn.modules.conv import Conv2d
# model = models.resnet18(pretrained=False)
# model.fc = nn.Linear(in_features=512, out_features=1, bias=True)
# print(model)

if torch.cuda.is_available():
# loss_weight = torch.Tensor([0.5, 1])
# loss_weight = loss_weight.to(device)
criterion = MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
scheduler = ReduceLROnPlateau(optimizer, 'min', patience=15)

def train():
    # % % time
    # keeping-track-of-losses
    for epoch in range(1, num_epochs + 1):
        # keep-track-of-training-and-validation-loss
        # train_loss = 0.0

        # training-the-model
        # i= 0
        total_loss = []

        for data, target in tqdm(train_loader):
            # print('yes')
            # i = i+1
            # print(i)
            # move-tensors-to-GPU
            data = data.to(device)
            target = target.to(device)

            # target = target.long()
            # clear-the-gradients-of-all-optimized-variables
            # forward-pass: compute-predicted-outputs-by-passing-inputs-to-the-model
            output = model(data)
            # results = torch.max(output, 1).indices
            # calculate-the-batch-loss
            loss = criterion(output, target)
            # backward-pass: compute-gradient-of-the-loss-wrt-model-parameters
            # perform-a-ingle-optimization-step (parameter-update)
            #perform learning rate scheduler step
            # update-training-loss
            # train_loss += loss.item() * data.size(0)
        val_loss = 0
        with torch.no_grad():
            for data, target in tqdm(validation_loader):
                # print('yes')
                # i = i+1
                # print(i)
                # move-tensors-to-GPU
                data = data.to(device)
                target = target.to(device)

                # target = target.long()
                # clear-the-gradients-of-all-optimized-variables
                # forward-pass: compute-predicted-outputs-by-passing-inputs-to-the-model
                output = model(data)
                # results = torch.max(output, 1).indices
                # calculate-the-batch-loss
                loss = criterion(output, target)
                val_loss = val_loss + loss

        print(f'Epoch: {epoch}\t Train Loss: {sum(total_loss)/len(train_loader)}\t Validation Loss: {val_loss/len(validation_loader)}')
        torch.save(model.state_dict(), "model_checkpoints/checkpoint_" + str(epoch) + ".pth")

Hmmm the thing is I don’t know how difficult the task is.
I would tell you to try to find a network pretrained on similar images so that you can use the features (there should be challenges in kaggle or so).
In the worst case use at least vision networks pretrained on imagenet.

It would be nice if you could pose the whole problem in pytorch and apply the loss over the images directly. But seems diff