Hi all,
I’m currently working on two models that train on separate (but related) types of data. I’d like to make a combined model that than take in an instance of each of the types of data, runs them through each of the models that was pre-trained individually, and then has a few feed-forward layers at the top that process the combined result of the two individual models.
So far, I know that I can modify forward to take in both inputs, so I just have copied the architectures of my individual models into the combined one, process them both separately by running them through the correct layers in forward(), and then combining the results as I described. What I can’t figure out how to do is used the pre trained models, rather than having to use the same architecture from scratch, in the combined model.
I’d really appreciate any input! Thanks!
The logic you described seems to be reasonable.
I’m not sure I understand the issue correctly, but it seems you are not sure, how to restore the pretrained models and use them in the new model.
If that’s the case, you could instantiate both pretrained models, load the state_dicts, and pass them to your new model.
I’ve created a small example, since it might be easier to just see the code:
class MyModelA(nn.Module):
def __init__(self):
super(MyModelA, self).__init__()
self.fc1 = nn.Linear(10, 2)
def forward(self, x):
x = self.fc1(x)
return x
class MyModelB(nn.Module):
def __init__(self):
super(MyModelB, self).__init__()
self.fc1 = nn.Linear(20, 2)
def forward(self, x):
x = self.fc1(x)
return x
class MyEnsemble(nn.Module):
def __init__(self, modelA, modelB):
super(MyEnsemble, self).__init__()
self.modelA = modelA
self.modelB = modelB
self.classifier = nn.Linear(4, 2)
def forward(self, x1, x2):
x1 = self.modelA(x1)
x2 = self.modelB(x2)
x = torch.cat((x1, x2), dim=1)
x = self.classifier(F.relu(x))
return x
# Create models and load state_dicts
modelA = MyModelA()
modelB = MyModelB()
# Load state dicts
modelA.load_state_dict(torch.load(PATH))
modelB.load_state_dict(torch.load(PATH))
model = MyEnsemble(modelA, modelB)
x1, x2 = torch.randn(1, 10), torch.randn(1, 20)
output = model(x1, x2)
Thanks so much! Exactly what I needed; I wasn’t aware I could use whole models in the forward (but I guess that makes sense, as they all subclass nn.Module).
@smth - Not a homework problem, just trying to learn something completely out of my domain.
If I understand it correctly, if modelA is resnet152, I can change self.fc1 to self.fc1 = nn.Linear(2048, x) where 2048 is in_features for resnet152 and x = out_features one desires. We can do the same for modelB. But in self.classifier = nn.Linear(4,2) where did those number 4 and 2 come from.
I understand these are very trivial for a man of your stature in programming, but for nubes…I appreciate your prompt response and making sure you are not solving someone else’s homework.
The number of input features for self.classifier is defined by the sum of the output features of both “pre-ensemble” models.
As you can see, ModelA and ModelB both output 2 values (from self.fc1). These features will be concatenated in MyEnsemble and passed to self.classifier, which takes 4 input features.
@ptrblck, thank you for the toy example. How would you change the script if modelB 's input was taken from one of the layers output of modelA. And the ensemble output has 2 heads.In the modification below, fc1 of modelA is ran twice (not efficient):
class MyModelA(nn.Module):
def __init__(self):
super(MyModelA, self).__init__()
self.fc1 = nn.Linear(20, 10)
self.fc2 = nn.Linear(10, 2)
def forward(self, x):
x = self.fc1(x)
x = self.fc2(x)
return x
class MyModelB(nn.Module):
def __init__(self):
super(MyModelB, self).__init__()
self.fc1 = nn.Linear(10, 2)
def forward(self, x):
x = self.fc1(x)
return x
class MyEnsemble(nn.Module):
def __init__(self, modelA, modelB):
super(MyEnsemble, self).__init__()
self.modelA = modelA
self.modelB = modelB
def forward(self, x1, x2):
head1 = self.modelA(x1)
head2 = self.modelB(self.modelA.fc1(x1))
x = torch.cat((head1, head2), dim=1)
return x
I see! That requires modifying the forward method of modelA?
Here is a few words about what I am trying to do.
I have ModelA : [input/image -> encoder1 -> header1 -> output1].
The model was trained and has pretty good accuracy.
Now I want to add a 2nd task.
For that purpose, I run the entire dataset through ModelA and for each example, save the activation at the last layer of the encoder as numpy array file.
Those numpy arrays are then used as the inputs to modelB[input2 -> header2 -> output2]. Then run optimization and save the weights of modelB.
Finally, I want to combine modelA and modelB into a combo-model with 2 heads:
This combo_model is only used at test time.
(I understand I could start with the combo model architecture and train the modelA part with freezing modelB branch and then optimize modelB with freezing ModelA. IMHO, the approach is more cumbersome!)
You don’t need to retrain the models, but could just redefine the forward of modelA and load it’s state_dict.
The state_dict contains only the parameters of the model, so you are free to manipulate the actual forward method as you wish, since the computation graph is created dynamically.
Hi, i’m working with the exact same structure of your example above. How can i keep frozen the weights of the two models A and B (pre-trained) and train only the classifier layer of the ensemble model?