I am trying to implement the following work:
https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/
Amongst many other things, this work found out that would be better to use the model with 101 output classes (classification problem) instead of 1 class (regression problem). This is the first time I’m trying to use a pretrained model for finetuning and I am having some trouble training the network. My dataset is decently preprocessed as the work suggests. But it seems my loss function is not improving the network. My accuracy don’t seem to improve as the epochs pass by. For now, I’m using VGG16 as the work suggests, Adam as the optimizer and L1Loss (they mentioned using MAE as an evaluation method, so I thought would be the best choice to stick with it, but I’m not sure if it’s the best idea). My model is written as it follows:
class vgg16(nn.Module):
def __init__(self):
super(vgg16, self).__init__()
self.vgg16 = models.vgg16_bn(pretrained=True)
self.vgg16.classifier[6] = nn.Sequential(
nn.Linear(4096, 1000, bias=True),
nn.ReLU(),
nn.Dropout(0.4),
nn.Linear(1000, 101),
nn.LogSoftmax(dim=1)
)
def forward(self, x):
outputs = self.vgg16(x)
return torch.argmax(outputs, dim=1)
I have written a train function and its core is the following:
for epoch in range(num_epochs):
train_bar = tqdm(train_loader)
train_running_loss = 0.0
train_running_corrects = 0
val_running_loss = 0.0
val_running_corrects = 0
for inputs, labels in train_bar:
model.train()
# plt.imshow(inputs[0].permute(1, 2, 0))
# plt.show()
inputs = inputs.to(device)
labels = labels.to(device).type(torch.DoubleTensor)
optimizer.zero_grad()
with torch.set_grad_enabled(True):
outputs = model(inputs).type(torch.DoubleTensor)
# print(outputs)
# print(labels)
loss = criterion(outputs, labels)
loss.requires_grad = True
outputs = outputs.type(torch.ShortTensor)
labels = labels.type(torch.ShortTensor)
# print(outputs, labels)
train_corrects_per_batch = torch.sum(torch.eq(outputs, labels)).item()
loss.backward()
optimizer.step()
train_running_loss += loss.item() * inputs.size(0)
train_running_corrects += train_corrects_per_batch
train_bar.set_description(
desc=f"Train Loss: {train_running_loss / len(train_bar):.4f} - Accuracy: {train_running_corrects / len(train_bar):.4f}"
)
train_epoch_loss = train_running_loss / len(train_bar)
train_epoch_acc = train_running_corrects / len(train_bar)
print(f'Train Loss: {train_epoch_loss:.4f} - Accuracy: {train_epoch_acc:.4f}')
val_bar = tqdm(val_loader)
# running_loss = 0.0
# running_corrects = 0
for inputs, labels in val_bar:
model.eval()
inputs = inputs.to(device)
labels = labels.to(device).type(torch.DoubleTensor)
optimizer.zero_grad()
with torch.set_grad_enabled(False):
outputs = model(inputs).type(torch.DoubleTensor)
loss = criterion(outputs, labels)
outputs = outputs.type(torch.ShortTensor)
labels = labels.type(torch.ShortTensor)
corrects_per_batch = torch.sum(torch.eq(outputs, labels)).item()
val_running_loss += loss.item() * inputs.size(0)
val_running_corrects += corrects_per_batch
val_bar.set_description(
desc=f"Val Loss: {val_running_loss / len(val_bar):.4f} - Accuracy: {val_running_corrects / len(val_bar):.4f}"
)
val_epoch_loss = val_running_loss / len(val_bar)
val_epoch_acc = val_running_corrects / len(val_bar)
print(f'Val Loss: {val_epoch_loss:.4f} - Accuracy: {val_epoch_acc:.4f}')
I’m not finding issues in what I have written, so I am sharing this with you hoping someone has a hint on what’s wrong. I also have a question related to finetuning: the pretrained network was trained for image classification with 1000 classes on Imagenet and I’m using it for age classification with 101 classes, do I need to do something more to adapt the network for a different classification such as this one? Or is it enough to just change the output layer to 101 neurons as I did?
I hope I could explain the issue well enough. If you need some more clarification on this problem, please let me know, so I can make myself clearer. Thanks in advance.