Emerging 2 different models to create a multi-model

tnguyen87 · March 20, 2022, 3:22am

Greetings,

I have 2 different models - A (GNN) and B (LSTM). I am trying to combine these models to predict the same output “y”. These models only share the same input and output only. I found 2 different posts (Merging two models & Combining Trained Models in PyTorch - #2 by ptrblck) and noticed that they are different. I wonder which one would everyone recommend to follow.

About defining “optimizer = optim.X” where X can be SGD or Adam, if I follow the a post from @ptrblck (Merging two models), then does “optimizer” need to be "optimizer = optim.Adam(modelA.parameters()+modelB.parameters()) ?

About the model.train(), would a post from @ptrblck (Merging two models) take care? loss.backward() would depends on each model.

I have just started learning pytorch recently, so I am very new to this field. I’d really appreciate any input! Thank you for the help! - Tom

ptrblck · March 20, 2022, 8:19pm

The linked use cases are a bit different as the first one (merging two models) seems to pass features from one model to the other and could train these models in an end2end manner.
The “ensemble” approach uses two pretrained models to combine their output features (often from the penultimate layer) to train a new classifier, which could boost the performance compared to each model standalone. However, if you want to train all models from scratch, take a look at the Stacking section of ensemble learning as the training procedure is using “stages” and differs from the “standard” training routine.
In particular the classifier expects the stage0 models to be trained already and expects to see their predictions while bein trained.
This would mean that an end2end training might yield a worse performance and you would have to create separate training and validation splits to train each stage.

tnguyen87 · March 21, 2022, 6:37pm

Hello @ptrblck, thank you for your thoughts and recommendations.

May I ask you for each scenario?

I did not think about the case where an end2end training can possibly yield a worse performance. Based on your comments, I think I will need to write an additional nn.Module class to combine 2 models as you showed in a post; but can I write it without passing features from one model to another? Also, do you have any suggestion on dealing with an optimizer and def train() function (what I am looking for is whether there is an additional step to write for optimizer and train(); I am stuck at looking for a way to deal with combining y_pred = mol(features_from_A) and y_pred = mol(features_from_B)); Because each model works well (but still not great), I want to boost a prediction performance by combining 2 models to predict output.

In a case the “ensemble” approach where I want to train 2 models from the scratch (based on your provided link) along with reading several papers observing authors combined their models (a type of models does not matter) at the FC layer. May I ask whether I still need to write a new “nn.Module class” to do this work?

ptrblck · March 22, 2022, 4:37am

Yes, you do not need to combine the “base models” together and could pass their features to a final classifier.

Not necessarily, but a custom nn.Module might be more convenient.
You can directly pass the outputs of any module to another module, e.g. as:

modelA = MyModelA()
modelB = MyModelB()
classifer = MyClassifier()

outA = modelA(x)
outB = modelB(x)
out = torch.stack((outA, outB), dim=1)
out = classifier(out)

However, your code might be cleaner if you are wrapping this logic into a custom module.

tnguyen87 · March 24, 2022, 4:58am

Wonderful! Thank you for a snippet code example. I will try this out to see which direction fits my approach better!

tnguyen87 · April 27, 2022, 7:22pm

Hello @ptrblck ,
May I ask whether outA is a predicted value obtained from model A in this case? I am think that outA is a feature at the fully-connected layer of model A; however, I am not quite sure. Could you please confirm whether it is correct?

Finally, I have two models (A&B) are trained with different approaches (one with dataloader/bucketIteractor and another with a traditional batch). May I ask whether I need to use the same approach to train both models in order to stack their outputs for a combined model? I think that an answer might be no, but I still want to hear thoughts from an expert.

tnguyen87 · April 28, 2022, 2:19am

Based on this link (Combining Trained Models in PyTorch - #66 by rudascience), outA and outB are collected after FC layer. Because of this, my first question is answered.

Following with this post (Combining Trained Models in PyTorch - #68 by rudascience), it seems like I only need to define an optimizer and push to classifer.cuda() for classifer instead for both 3 models (i.e., modelA, modelB and classifier) ? Is this true or am I missing something in the post?

ptrblck · April 28, 2022, 5:32am

You can define the models as you want, so can return the activation outputs (features) from the penultimate layer or the logit prediction.

For the optimizer: it also depends on your actual use case. If you want to only train the classifier, push only its parameters to the optimizer. On the other hand, if you want to train all models, use different optimizers, a combined one etc.