I have a model whose last layers are like seen below
'
'
'
out = self.relu(self.bn1(out))
out = F.avg_pool2d(out, 8)
out = out.view(-1, self.nChannels)
out = self.fc(out)
return out
The last layer’s output is of torch size [18, 2048]
I have trained the model and have a .pth file, I would like to do finetuning/transfer learning for a downstream task by adding a Linear layer and then train.
What is the right way to do this?
Do I initialize the whole network first and thereafter load the model_state_dict or just load the model from path?
Essentially, I would like to have something like this for the transfer learning/downstream training:
import torch.nn as nn
model = Net(args1, args2, args3)
model = model.load_state_dict(torch.load(PATH))
model.eval()
class final_network(nn.Module):
def __init__(self, use_face=False, num_glimpses=1):
super(final_network, self).__init__()
self.pre_network = model
self.final_task = nn.Sequential(
nn.Linear(2048, 468),
)
def forward(self, x):
feature = self.pre_network(x)
feature = feature.view(feature.size(0), -1)
out = self.final_task(feature)
return out
Do I use the entire model (checkpoint) as seen here Saving and loading models for inference in PyTorch — PyTorch Tutorials 1.12.1+cu102 documentation ? I am a little confused as all the info found online are for popular network architecture.
Many thanks.