I am trying to finetune an inception_v3 model and I notice that training is quit instable. I wan’t to check if I my setup is correct.
Preprocessing
All the images used to train the models in the torchvision model zoo are normalized with:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
But because the inception_v3’s weights were copied from tensorflow, the normalized images need to undergo another transformation:
x = x.clone()
x[:, 0] = x[:, 0] * (0.229 / 0.5) + (0.485 - 0.5) / 0.5
x[:, 1] = x[:, 1] * (0.224 / 0.5) + (0.456 - 0.5) / 0.5
x[:, 2] = x[:, 2] * (0.225 / 0.5) + (0.406 - 0.5) / 0.5
model changes
The last layer in the model and the auxilery layer are replaced with another fully connected layer to match my output vector.
class Custom(Inception3):
def __index__(self, num_classes=28, aux_logits=False, transform_input=True):
Inception3.__init__(self, 1000, aux_logits, transform_input)
self.load_state_dict(model_zoo.load_url(model_urls['inception_v3_google']))
if aux_logits:
self.AuxLogits = InceptionAux(768, num_classes)
self.fc = nn.Linear(2048, num_classes)
Loss
During training the loss of the auxilery classifier and the final classifier are summed into one loss:
y_pred_end, y_pred_middle = model(x)
cw = class_weigths_idx(idx, Y)
loss_1 = F.binary_cross_entropy(y_pred_end, y, cw)
loss_2 = F.binary_cross_entropy(y_pred_middle, y, cw)
loss = loss_1 + loss_2
Learning rate
I’ve set a small learning rate for the pre-trained layers and a higher learning rate for the new fully connected layer.
lr1 = 1e-7
lr2 = 1e-5
lr3 = 1e-3
optimizer = torch.optim.Adam([
{"params": model.Conv2d_1a_3x3.parameters(), "lr":lr1},
{"params": model.Conv2d_2a_3x3.parameters(), "lr":lr1},
{"params": model.Conv2d_2b_3x3.parameters(), "lr":lr1},
{"params": model.Conv2d_3b_1x1.parameters(), "lr":lr1},
{"params": model.Conv2d_4a_3x3.parameters(), "lr":lr1},
{"params": model.Mixed_5b.parameters(), "lr":lr2},
{"params": model.Mixed_5c.parameters(), "lr":lr2},
{"params": model.Mixed_5d.parameters(), "lr":lr2},
{"params": model.Mixed_6a.parameters(), "lr":lr2},
{"params": model.Mixed_6b.parameters(), "lr":lr2},
{"params": model.Mixed_6c.parameters(), "lr":lr2},
{"params": model.Mixed_6d.parameters(), "lr":lr2},
{"params": model.Mixed_6e.parameters(), "lr":lr2},
{"params": model.Mixed_7a.parameters(), "lr":lr2},
{"params": model.Mixed_7b.parameters(), "lr":lr2},
{"params": model.Mixed_7c.parameters(), "lr":lr2},
{"params": model.fc.parameters(), "lr":lr3},
])
Is this a proper configuration for finetuning the inception network. I notice that my training is very instable and the loss seems to increase instead of decreasing. However, the learning rate seems to be quite low.