VGG16 Finetuning - Train and Val accuracy not improving

I am trying to implement the following work:
https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/

Amongst many other things, this work found out that would be better to use the model with 101 output classes (classification problem) instead of 1 class (regression problem). This is the first time I’m trying to use a pretrained model for finetuning and I am having some trouble training the network. My dataset is decently preprocessed as the work suggests. But it seems my loss function is not improving the network. My accuracy don’t seem to improve as the epochs pass by. For now, I’m using VGG16 as the work suggests, Adam as the optimizer and L1Loss (they mentioned using MAE as an evaluation method, so I thought would be the best choice to stick with it, but I’m not sure if it’s the best idea). My model is written as it follows:

class vgg16(nn.Module):
    def __init__(self):
        super(vgg16, self).__init__()

        self.vgg16 = models.vgg16_bn(pretrained=True)
        self.vgg16.classifier[6] = nn.Sequential(
            nn.Linear(4096, 1000, bias=True),
            nn.ReLU(),
            nn.Dropout(0.4),
            nn.Linear(1000, 101),
            nn.LogSoftmax(dim=1)
        )

    def forward(self, x):
        outputs = self.vgg16(x)
        return torch.argmax(outputs, dim=1)

I have written a train function and its core is the following:

for epoch in range(num_epochs):
        train_bar = tqdm(train_loader)
        train_running_loss = 0.0
        train_running_corrects = 0
        val_running_loss = 0.0
        val_running_corrects = 0
        for inputs, labels in train_bar:            
            model.train()
            # plt.imshow(inputs[0].permute(1, 2, 0))
            # plt.show()
            inputs = inputs.to(device)
            labels = labels.to(device).type(torch.DoubleTensor)
            optimizer.zero_grad()

            with torch.set_grad_enabled(True):
                outputs = model(inputs).type(torch.DoubleTensor)
                # print(outputs)
                # print(labels)
                loss = criterion(outputs, labels)
                loss.requires_grad = True
                outputs = outputs.type(torch.ShortTensor)
                labels = labels.type(torch.ShortTensor)
                # print(outputs, labels)
                train_corrects_per_batch = torch.sum(torch.eq(outputs, labels)).item()
                loss.backward()
                optimizer.step()

            train_running_loss += loss.item() * inputs.size(0)
            train_running_corrects += train_corrects_per_batch

            train_bar.set_description(
                desc=f"Train Loss: {train_running_loss / len(train_bar):.4f} - Accuracy: {train_running_corrects / len(train_bar):.4f}"
            )

        train_epoch_loss = train_running_loss / len(train_bar)
        train_epoch_acc = train_running_corrects / len(train_bar)
        print(f'Train Loss: {train_epoch_loss:.4f} - Accuracy: {train_epoch_acc:.4f}')

        val_bar = tqdm(val_loader)
        # running_loss = 0.0
        # running_corrects = 0
        for inputs, labels in val_bar:            
            model.eval()
            inputs = inputs.to(device)
            labels = labels.to(device).type(torch.DoubleTensor)
            optimizer.zero_grad()

            with torch.set_grad_enabled(False):
                outputs = model(inputs).type(torch.DoubleTensor)
                loss = criterion(outputs, labels)
                outputs = outputs.type(torch.ShortTensor)
                labels = labels.type(torch.ShortTensor)
                corrects_per_batch = torch.sum(torch.eq(outputs, labels)).item()

            val_running_loss += loss.item() * inputs.size(0)
            val_running_corrects += corrects_per_batch

            val_bar.set_description(
                desc=f"Val Loss: {val_running_loss / len(val_bar):.4f} - Accuracy: {val_running_corrects / len(val_bar):.4f}"
            )

        val_epoch_loss = val_running_loss / len(val_bar)
        val_epoch_acc = val_running_corrects / len(val_bar)
        print(f'Val Loss: {val_epoch_loss:.4f} - Accuracy: {val_epoch_acc:.4f}')

I’m not finding issues in what I have written, so I am sharing this with you hoping someone has a hint on what’s wrong. I also have a question related to finetuning: the pretrained network was trained for image classification with 1000 classes on Imagenet and I’m using it for age classification with 101 classes, do I need to do something more to adapt the network for a different classification such as this one? Or is it enough to just change the output layer to 101 neurons as I did?

I hope I could explain the issue well enough. If you need some more clarification on this problem, please let me know, so I can make myself clearer. Thanks in advance.

But it seems my loss function is not improving the network. My accuracy don’t seem to improve as the epochs pass by.

How many epochs did you run? Is the loss decreasing?

Btw classic vgg’s classifier should look like this:

(classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace)
    (2): Dropout(p=0.5)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace)
    (5): Dropout(p=0.5)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )

You have replaced only 6th layer of classifier, so it looks like this:

(classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace)
    (2): Dropout(p=0.5)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace)
    (5): Dropout(p=0.5)
    (6): Sequential(
      (0): Linear(in_features=4096, out_features=1000, bias=True)
      (1): ReLU()
      (2): Dropout(p=0.4)
      (3): Linear(in_features=1000, out_features=101, bias=True)
      (4): LogSoftmax()
    )
  )

which is probably not as you wanted. If you wanted classic vgg classifier try:

self.vgg16.classifier[6] = nn.Linear(in_features=4096, out_features=101, bias=True)

For fine tuning you can also freeze weights of feature extractor, and retrain only the classifier. You can also experiment with retraining only some layers of classifier, or whole classifier and part of feature extractor. You can do it like this:

    for param in self.vgg16.parameters():
        param.requires_grad = False

    for param in self.vgg16.classifier.parameters():
        param.requires_grad = True

Sorry for taking long to reply, I have been busy with some real life stuff.

You have replaced only 6th layer of classifier, so it looks like this:

It is true, the 6th layer of the classifier is not as supposed to be. The funny thing is that I have previously tried using the exact same layer you proposed, but it had the same results.

How many epochs did you run? Is the loss decreasing?

I did run about 40 epochs. The loss stays pretty much the same, if I remember correctly, and the validation accuracy behaves equally.

For fine tuning you can also freeze weights of feature extractor, and retrain only the classifier. You can also experiment with retraining only some layers of classifier, or whole classifier and part of feature extractor. You can do it like this:

I have not tried training only the classifier, but I suspect this would not change the results. To be honest, at this very moment I’m suspecting these problems are more related to the dataset itself than anything else. In the following days I’ll getting back into this project and I’ll be reviewing the dataset as a whole. I hope I can bring better news at the time.