Trained model prediction is not consistent on the same data

Hello All,

I have been trying to classify two sets of images. Which are pretty close in looking to each other. One set of image is image of GUI buttons (mainly desktop) 30 Another set is textboxes (mainly desktop as well) 29 I have 40 images per class inside my train set and 10 per class in my validation set.

I have taken (almost verbatim) the code from the Pytorch tutorial And with slight changes (mainly ornamental) I could make it work. I have added tensorboard support and when I look at my training loss (I am using a resnet50 as a fixed feature extractor and then a FC layer on top with 2 class output) my training loss looks like this

0.6 smoothing -

0.87 smoothing -

Highest smoothing -

It looks okay to me, At least at the highest smoothing, it seems that the loss is decreasing well.

However, when I evaluate the model, with the eval dataset, each evaluation run on the same set of images spit out different prediction. I almost feel like the model is close to random prediction. I am not sure what I am getting wrong here.

Some important details -

1.> I am resizing the images to a 32x32 square image (even though the original ones has many different dimension. and most of the time rectangular) using a torchvison.transforms.

2.> I am normalizing them using [0.5, 0.5, 0.5], [0.5, 0.5, 0.5]

Please help me debug this

The loss value itself is quite “high” and depending which criterion you are using your model might not be learning enough. Based on the loss curve it also seems that there is initially some decrease in the loss, which hits a plateau at iteration or epoch 40.

Are you using model.eval() during the validation? This would make sure to disable dropout layers and to use the running stats in batchnorm layers instead the batch statistics.

Ok I agree. I wanted to go back to the drawing board and start with a model which I want to intentionally over fit (given, I have only 100 images, 50 per class that should not be a too difficult task, or at least that is what I thought). To prove that it can learn something. So I will detail what is happening here.

Case -1

Model -

class Reshape(torch.nn.Module):
    def forward(self, x):
        return x.reshape(-1, 64 * 7 * 7)


model_conv = nn.Sequential(nn.Conv2d(3, 8, kernel_size=3, padding=1), nn.ReLU(),
                           nn.AvgPool2d(kernel_size=2, stride=2),
                           nn.Conv2d(8, 16, kernel_size=3, padding=1), nn.ReLU(),
                           nn.AvgPool2d(kernel_size=2, stride=2),
                           nn.Conv2d(16, 32, kernel_size=3, padding=1), nn.ReLU(),
                           nn.AvgPool2d(kernel_size=2, stride=2),
                           nn.Conv2d(32, 64, kernel_size=3, padding=1), nn.ReLU(),
                           nn.AvgPool2d(kernel_size=2, stride=2),
                           Reshape(),
                           nn.Linear(3136, 256), nn.ReLU(),
                           nn.Linear(256, 128), nn.ReLU(),
                           nn.Linear(128, 84), nn.ReLU(),
                           nn.Linear(84, 2)
                          )

Transforms-

data_transforms = {
    'train': transforms.Compose([
        transforms.Resize((120, 120)),
        transforms.ToTensor(),
        transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
    ]),
    'test': transforms.Compose([
        transforms.Resize((120, 120)),
        transforms.ToTensor(),
        transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
    ]),
}

As you may see, I had not put any Dropout or any standard regularization technique so that I give it all chances to over-fit.

This is the loss curve looks like -

So the loss hits a plateau very soon and never decreases. (The loss is pretty bigger too)

Also, this is the lr - optimizer_conv = optim.SGD(model_conv.parameters(), lr=0.01, momentum=0.9)

And a scheduler for lr - exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

Case 2
Same model, just changing the lr

optimizer_conv = optim.SGD(model_conv.parameters(), lr=0.001, momentum=0.9)

Which keeps it very similar (from the numbers printed it does seem like the val loss and acc both are worse than last time though) -

** Case 3**

Here I go for a simpler model

Model -

class Reshape(torch.nn.Module):
    def forward(self, x):
        return x.reshape(-1, 16 * 30 * 30)


model_conv = nn.Sequential(nn.Conv2d(3, 8, kernel_size=3, padding=1), nn.ReLU(),
                           nn.AvgPool2d(kernel_size=2, stride=2),
                           nn.Conv2d(8, 16, kernel_size=3, padding=1), nn.ReLU(),
                           nn.AvgPool2d(kernel_size=2, stride=2),
                           Reshape(),
                           nn.Linear(14400, 256), nn.ReLU(),
                           nn.Linear(256, 84), nn.ReLU(),
                           nn.Linear(84, 2)
                          )

The LR is - optimizer_conv = optim.SGD(model_conv.parameters(), lr=0.001, momentum=0.9)

Here I experience something strange. For some runs of the whole code a graph like this will be produced -


Where the loss decrease is not really seen.

But for some others. The following kind of graph will be produced.


Which shows that the loss is decreasing well. And also the validation acc reaches 100% (larger than the test one!)

What is wrong here you think? Why the network before was not able to over-fit the data?


Also, an interesting insight is that, on the exact same data-set a HOG-SVM classical model reaches about 95% acc with a bit of effort.

So makes me think that whether I need more data. However, I would have expected using a CNN as a fixed feature extraction should have given me a great result.


Another update is with the new transform (120, 120) the restnet fine tuning is actually going much better :slight_smile: I guess with a 32 x 32 (the resize transform I had before) was too small to distinguish between two images which is really close to each other from look and feel. This is the loss curve when I train the final layer of resnet18 in this new setting -


Another update, some experiments with lr (0.01) and my first model shows that sometimes The loss does decrease. Sometimes not. That can even happen in two consecutive runs. This does not make sense to me. Why the behavior is not consistent if everything else stays the same? What am I missing here?

This might be expected, if your overall training is unstable and you would have to play around with some hyperparameters. Also, if you don’t seed the code and don’t force it to be deterministic, slight variations are expected. In the best case your training would be stable enough to yield approx. the same final performance. However, as your loss curves show, the training gets often stuck.

The model is generally able to overfit random data as seen here:

torch.manual_seed(2809)

model_conv = nn.Sequential(nn.Conv2d(3, 8, kernel_size=3, padding=1), nn.ReLU(),
                           nn.AvgPool2d(kernel_size=2, stride=2),
                           nn.Conv2d(8, 16, kernel_size=3, padding=1), nn.ReLU(),
                           nn.AvgPool2d(kernel_size=2, stride=2),
                           nn.Conv2d(16, 32, kernel_size=3, padding=1), nn.ReLU(),
                           nn.AvgPool2d(kernel_size=2, stride=2),
                           nn.Conv2d(32, 64, kernel_size=3, padding=1), nn.ReLU(),
                           nn.AvgPool2d(kernel_size=2, stride=2),
                           nn.Flatten(),
                           nn.Linear(3136, 256), nn.ReLU(),
                           nn.Linear(256, 128), nn.ReLU(),
                           nn.Linear(128, 84), nn.ReLU(),
                           nn.Linear(84, 2)
                          )

data = torch.randn(10, 3, 120, 120)
target = torch.randint(0, 2, (10,))
optimizer = torch.optim.Adam(model_conv.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

nb_epochs = 200
for epoch in range(nb_epochs):
    optimizer.zero_grad()
    output = model_conv(data)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()
    print('epoch {}, loss {}'.format(epoch, loss.item()))
2 Likes

I see, I am gonna manual_seed the model and then I will see how it behaves. However, since I see things are okay(ish) with resnet18/34 I may stick to transfer learning. Specifically given how small the data set actually is.