How can I connect a new neural network after a trained neural network and optimize them together

Hello, I am new in Pytorch and this question makes me waste a couple of days. I am trying to connect two different neural networks together. The model one is a trained NN which I have already saved as a .pth file. Then I want to put another NN with a totally different architecture after it. Then I combining those two models and train them together. How can I connect two models? I mean simply use the output from module one as the input of model two but use one optimize function to train them. Because I may put a dozen of models after them.

2 Likes

Yes, that would just work.
You can basically treat each model as a “layer” passing one output to the next layer/model.

If you would like to pass all parameters to the optimizer, just concatenate the lists together:

model1 = nn.Linear(1, 1)
model2 = nn.Linear(2, 2)

optimizer = torch.optim.SGD(
    list(model1.parameters()) + list(model2.parameters()),
    lr=1e-3
)
3 Likes

Thank you so much. This method can solve the problem. But I want to combine the two models as one model and train and optimize them together. The reason why I want to do this is that I will connect a dozen of models one by one. So do you have any idea of how to combine different models? Thank you so much.

1 Like

You can just combine them in the training loop:

for data, target in train_loader:
    output = model1(data)
    output = model2(output)
    output = model3(output)

   # or use a loop instead

or just wrap them in another nn.Module and treat them as layers:

class MyHugeModel(nn.Module):
    def __init__(self, models):
        super(MyHugeModel, self).__init__()
        self.models = models
        
    def forward(self, x):
        for model in self.models[:-1]:
            x = F.relu(model(x))
        x = self.models[-1](x)  # don't use relu for last model
        return x
            
class MySmallModel(nn.Module):
    def __init__(self):
        super(MySmallModel, self).__init__()
        self.lin = nn.Linear(10, 10)
        
    def forward(self, x):
        x = self.lin(x)
        return x


models = nn.ModuleList()
for _ in range(10):
    models.append(MySmallModel())

model = MyHugeModel(models)
x = torch.randn(1, 10)
output = model(x)

Using this approach, you can pass all parameters as model.parameters() to your optimizer.

5 Likes

Thank you so much, this is exactly what I want. Thank you.
By the way, may I ask another question please? Well, I tried to make the architectures of the neural networks more complex so, in some models, some layers will skip connecting with other layers. How can I write this skip connection function? Should I do this in the forward function?

Yes, the forward method is the right place to add skip connections.
Have a look at the resnet implementation for an example.

2 Likes

Thank you so much. But I faced another question. I let the function randomly choose different convolution layers with different kernel sizes. The question is I used skip connect in this function. So when it skips connecting, the size of the tensor is different and cannot add each other. I mean I tried to sum different layers output when they are skipping connected. But the size of the tensors are different and the pytorch gave me an error:

The size of tensor a (26) must match the size of tensor b (18) at non-singleton dimension 3

I know in this error, it said the size of the tensor is 18X18, but another one is 26X26. So, could you please tell me some methods to change the size of the output.

You could use an interpolation (or cropping) to create equally sized activations in a similar manner as is currently being done with downsample in the resnet example.
Note that the optional downsampling changed the number of channel, not the spatial size, so that you cannot copy the code directly.

1 Like

Thank you so much. That is really helpful.:clap::clap::clap: Can I ask one more question, please? Well, I wrote a code which allowed the system to randomly choose different convolution layers with different kernel sizes and the kernel size is the only difference between each convolution layers. For example, 3 conv layers: 3X3, 1X1,5X5. But how can I get the information of the kernel size and save them in an array or a string after I create this model?

What would you want to do with this information?
Is your model choosing one of these conv layers randomly for each batch or only once at initialization?

Well, I am trying to work on NAS(Neural Architecture Search). And my prototype will choose different types of convolution layer when I create a new model. I need to save that information and find the best performance one.

I wrote an independent file which defines different convolution layers in it. And this function will return a random layer which have a random kernel size. And this function will be used in the definition file of the model. I mean, there is a loop to add the layers into the model and a special forward function which allows skipping connection between layers in the definition file.

Ah OK, thanks for the information.
You can get the kernel size using conv.kernel_size and store it in your config file.

1 Like

Oh, thank you so much. I never thinking of one line code can finish this work. Anyway, thank you so much for answering me so many questions. You are so kind. Thank you so much.

1 Like

Hello ptrblock
I am trying to connect 3 different GNC together
but the solution you propose here does not work:
here is my code:
a very simple GNC

class Net(torch.nn.Module):
def init(self):
super(Net, self).init()
self.conv1 = GCNConv(300,256)
self.conv2 = GCNConv(256, 1)

def forward(self,x, edge_index):
    x = self.conv1(x, edge_index)
    x = x.relu()
    x = self.conv2(x, edge_index)
    return x

then

here we instantiate all of the variables necessary to train the model

model1 = Net().cpu()
model2 = Net().cpu()
model3 = Net().cpu()

set hyperparameters

optimizer = torch.optim.Adam(model1.parameters(), lr = 0.01)
criterion1 = MSELoss()

loading subsets for each graph layer

data_L3 = data[‘graph1’]
data_L4 = data[‘graph2’]
data_L5 = data[‘graph3’]

create different subsets for reach graph layer

loader_L3 = RandomNodeSampler(data_L3,2)
loader_L4 = RandomNodeSampler(data_L4,2)
loader_L5 = RandomNodeSampler(data_L5,2)

then

def train(loader,model):
for epoch in range(2):
running_loss = 0.0
model.train()
for data in loader:
data = data.cpu()
# change to float to match with the weights of the class model
data.x = data.x.float()
logits = model1(data.x, data.edge_index)
loss = criterion1(logits, data.y)
optimizer.step()
running_loss += loss.item()
# we print out here if we want to see the loss
print(epoch+1,running_loss/2000)
running_loss = 0.0
return model

Then train 3 models

Train each model on correct data subset

train(loader_L3,model1)
train(loader_L4,model2)
train(loader_L5,model3)

How I can combine into one single model
I try to use
torch.stack([self.GNC1.weight, self.GNC2.weight, self.GNC2.weight], dim=0)
with no success

Can you help, please

Could you post the error message you were seeing or what exactly is not working, please?

PS: you can post code snippets by wrapping them into three backticks ```, which would make debugging easier :wink: