I am trying to update the feature extractor and classifier with different learning rates, basically I split the network and do it. In optimizer I used param groups for optimizing the network.
resnet50 = models.resnet50(pretrained=True)
classifier = nn.Sequential(OrderedDict([("classifier", nn.Linear(1000, 31))]))
resnet50 = nn.Sequential(resnet50, classifier)
featExtractor = nn.Sequential(*(list(resnet50.children())[:-1])).cuda()
classifierModel = nn.Sequential(*(list(resnet50.children())[-1:])).cuda()
clf_optim = torch.optim.Adam([{'params': featExtractor.parameters(), 'lr':1e-4},
{'params': classifierModel.parameters()}], lr=5e-4)
for epoch in trange(epochs, leave=False):
for _ in trange(iterations, leave=False):
source_x, source_y = next(iter(amazonData))
source_x, source_y = source_x.to(device), source_y.to(device)
for _ in range(k_clf):
features = featExtractor(source_x)
out = classifierModel(features)
clf_loss = clf_criterion(out, source_y)
clf_optim.zero_grad()
clf_loss.backward()
clf_optim.step()
print("total_loss: ", clf_loss)
Ps: I know I could have just used resnet50 instead of featExtractor and classifier instead of classifierModel, but this is just shortest version of what I am doing, and I am basically looking to validate the idea ?