Training acc going up, test acc doesnt change :(?

Hi, im using a resnet34 on the stanford car dataset(https://www.kaggle.com/jutrera/stanford-car-dataset-by-classes-folder).

The things that are happening

  • Training loss is decreasing, training acc is increasing
  • Test accuracy stays the same throughout the training, doesnt matter how many epochs

With about 5 epochs the models peaks to 90% acc and then doesn’t increase that much.
But the validation accuracy is still around 4%, doesn’t matter when i measure it (1, 2, …, 6 epochs)

Can somebody help me find a problem with my code or what i might improve upon?

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.models as models
import torchvision.transforms as transforms
import time
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

tfms = transforms.Compose([transforms.Resize((300, 300)),
                           transforms.ToTensor()])

dataset = torchvision.datasets.ImageFolder(root="../../Data/car_data2/train", transform = tfms)
trainloader = torch.utils.data.DataLoader(dataset, batch_size = 16, shuffle=True)

dataset2 = torchvision.datasets.ImageFolder(root="../../Data/car_data2/test", transform = tfms)
testloader = torch.utils.data.DataLoader(dataset2, batch_size = 16, shuffle=False)

def train_model(model, criterion, optimizer, n_epochs = 5):
    
    for epoch in range(n_epochs):
        running_loss = 0.0
        running_correct = 0.0
        for i, data in enumerate(trainloader, 0):

            # get the inputs
            inputs, labels = data
            inputs = inputs.to(device)
            labels = labels.to(device)
            optimizer.zero_grad()
            
            # forward + backward + optimize
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            running_correct += (labels==predicted).sum().item()
            
            if i % 10 == 9:    # print every 10 mini-batches (change according to data)
                print('[Epoch %d, %5d / %5d] loss: %.3f acc: %.3f' %
                      (epoch + 1, i + 1, len(trainloader), running_loss / 10, (100/16)*running_correct/10))
                running_loss = 0.0
                running_correct = 0.0

    print('Finished Training')
    return model

model_ft = models.resnet34(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 196)

model_ft = model_ft.to(device)

criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

model_ft = train_model(model_ft, criterion, optimizer, n_epochs=5)

# Test the model
correct = 0
total = 0
with torch.no_grad():
    for i, data in enumerate(testloader):
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = model_ft(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
print('Accuracy of the network on the test images: %d %%' % (
    100 * correct / total))

Could you try to add model.eval() before testing your model?

1 Like

YES! It seemed to work, may i ask why that is? I understand it disables dropout and things amongst other things. So when im using a net that has no batch norm/dropout, model.eval() isnt necessary?

Basically me using a resnet requires the model.eval()?

But besides that, thank you very much, it really helped me, been wondering about it for days :).

model.eval() might change all modules using self.training internally to change the behavior.
Usually it’s used in nn.Dropout and nn.BatchNorm layers.

Yes, ResNet uses batch norm layers, so you should call model.eval() before testing the model.

1 Like