My model does not overfit

Hi,

I’m a DL beginner and I’m trying to build a small model that takes an image of an eye and maps it to an x-coordinate on the screen. The x-coordinates range from 0 to around 1600.

I have collected around 500 samples along with the corresponding x-coordinate. I managed to go from a starting RMSE of ± 800px down to ± 540px. So I tried with better data (higher quality, less noise) and more samples (2000 images). I re-trained my model and now the RMSE is converging to 490px.

This isn’t a great, so I thought that there might be something wrong with the model itself. I read somewhere that one can perform an “overfit check”, i.e. training on only 2 samples and letting the model overfit. So that would mean getting my RMSE down to near 0 on my 2 samples after thousands of epochs.

But my model doesn’t overfit. It converges to around 370px.

I tried a lot of things:

  • simplifying the architecture, so I went from Conv → Fc → Fc → Fc to Fc → Fc
  • removing any data augmentation & regularization
  • playing around with learning rate, number of neurons, batch_size (set to 2 since I’m trying to overfit on 2 samples)
  • using PyTorch’s nn.MSELoss() instead of my own RMSELoss function

But it doesn’t work.

Here’s my code:

class MyDataset(Dataset):
    def __init__(self, csv_file, root_dir):
        self.annotations = pd.read_csv(csv_file)
        self.root_dir = root_dir

    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, index):
        img_path = os.path.join(self.root_dir, self.annotations.iloc[index, 0])
        img = cv2.imread(img_path)
        res = cv2.resize(img, dsize=(40, 10), interpolation=cv2.INTER_CUBIC)
        res = res.astype(np.float32)
        res = torch.from_numpy(res)
        res = res.permute(2, 1, 0)
        y_label = torch.tensor(int(self.annotations.iloc[index, 1]))

        return (res, y_label)


class ConvNet(torch.nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        self.fc2 = nn.Linear(1200, 120)
        self.fc3 = nn.Linear(120, 1)

    def forward(self, x):
        out = x.reshape(x.size(0), -1)
        out = self.fc2(out)
        out = self.fc3(out)
        return out

dataset = MyDataset(
    csv_file='teeeest.csv',
    root_dir='dlib_data'
)

train_set, test_set = torch.utils.data.random_split(dataset, lengths=[2, 1])

train_loader = DataLoader(dataset=train_set, batch_size=2, shuffle=True)
test_loader = DataLoader(dataset=test_set, batch_size=2, shuffle=True)

def RMSELoss(yhat,y):
    return torch.sqrt(torch.mean((yhat-y)**2))

model = ConvNet()
# criterion = nn.MSELoss()
criterion = RMSELoss

optimizer = torch.optim.Adam(model.parameters(), lr=0.00001)

losses = []

for epoch in range(30000):
    for batch_idx, (data, targets) in enumerate(train_loader):
        data = data.to(device=device)
        targets = targets.to(device=device)
        scores = model(data)
        loss = criterion(scores.float(), targets.float())
        losses.append(loss.item())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print('Cost: {0} = {1}'.format(epoch, sum(losses)/len(losses)))

Does anyone have any idea on how to make my network overfit, so I know the network’s code is working?

1 Like

Make sure both your inputs and your outputs are normalized and try it again. How to normalize a tensor to 0 mean and 1 variance? - #2 by ptrblck

I have already tried normalizing my inputs with a transforms, but what do you mean exactly by normalizing my outputs?

I mean if your target outputs are numbers between 0 and 1600, normalize it with the mean and variance of your targets, or so that the target outputs are between -1 and 1, or both.

Thanks, this seems to be giving some decent results… My model can overfit now.

1 Like

What specific solution did you do? So other people can see.

Hello. The domain of my target variable is strictly 0-1. My dataset, however, has target variable range of 0.1 - 0.6. Is it necessary to normalize it? I’m using Sigmoid at the last layer so if I’m normalizing it, it won’t be able to predict any real data outside my dataset target variable range.