Hi,
I am working on a problem that requires pre-training a first model at the beginning and then using this pre-trained model and fine-tuning it along with a second model. When training the first model, it requires a classification layer in order to compute a loss for it. However, I do not need my classification layer when using the pretrained model along with my second model. I only need the output (which in my case is the hidden state of an LSTM). At the second stage, when loading the pre-trained model, if i remove the classification layer, then does PyTorch automatically ignore the weights for that classification layer and keep the rest when fine-tuning? Are the weights for each sub-module separately saved and then used as needed? This is a snippet of the code:
class LSTMD(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super(LSTMD, self).__init__()
self.hidde_size = hidden_size
self.embedding = nn.Embedding(num_classes, hidden_size)
self.lstm = nn.LSTMCell(input_size, hidden_size)
self.classification = nn.Linear(hidden_size, num_classes)
def forward(self, x):
h,c = [torch.zeros(1, self.hidden_size), torch.zeros(1, self.hidden_size)]
x = self.embedding(x)
h,c = self.lstm(x, (h,c))
out = self.classification(h)
return out
model = LSTMD(512,512,1000)
state = {'model': model, 'model_optimizer': model_optimizer}
torch.save(state, 'saved.pth')
After that, I only want to load my pre-trained model but only use h
from the LSTM output and discard the classification layer when fine-tuning the model.
class New(nn.Module):
def __init__(self, checkpoint):
super(New, self).__init__()
checkpoint = torch.load(checkpoint)
old_model = checkpoint['model']
# I removed out = self.classification(h) from the old model
modules = list(old_model.children())[:-1]
self.new_model = nn.Sequential(*modules)
def forward(self, x):
out = self.new_model(x)
return out
But then I get the error: forward() takes _ positional arguments but _ were given
_ depends on the arguments in my case.
However, when I remove the
self.classification = nn.Linear(hidden_size, num_classes)
and
out = self.classification(h)
from the first model, and load the checkpoint, I get no error. But i’m afraid this is not the correct way as later the model will have problem fine-tuning? The problem mainly happens when I call the forward function on the new model created by nn.Sequential
. Even if i keep the layers by doing:
modules = list(old_model.children())
self.new_model = nn.Sequential(*modules)
And then run the forward function, it dosen’t work. Something is happening is the sequential operation .Any clue what may be the problem
Thanks!