How can I connect a new neural network after a trained neural network and optimize them together

cat-loves-donuts · June 18, 2019, 4:57pm

Hello, I am new in Pytorch and this question makes me waste a couple of days. I am trying to connect two different neural networks together. The model one is a trained NN which I have already saved as a .pth file. Then I want to put another NN with a totally different architecture after it. Then I combining those two models and train them together. How can I connect two models? I mean simply use the output from module one as the input of model two but use one optimize function to train them. Because I may put a dozen of models after them.

ptrblck · June 18, 2019, 10:52pm

Yes, that would just work.
You can basically treat each model as a “layer” passing one output to the next layer/model.

If you would like to pass all parameters to the optimizer, just concatenate the lists together:

model1 = nn.Linear(1, 1)
model2 = nn.Linear(2, 2)

optimizer = torch.optim.SGD(
    list(model1.parameters()) + list(model2.parameters()),
    lr=1e-3
)

cat-loves-donuts · June 19, 2019, 2:26pm

Thank you so much. This method can solve the problem. But I want to combine the two models as one model and train and optimize them together. The reason why I want to do this is that I will connect a dozen of models one by one. So do you have any idea of how to combine different models? Thank you so much.

ptrblck · June 19, 2019, 2:46pm

You can just combine them in the training loop:

for data, target in train_loader:
    output = model1(data)
    output = model2(output)
    output = model3(output)

   # or use a loop instead

or just wrap them in another nn.Module and treat them as layers:

class MyHugeModel(nn.Module):
    def __init__(self, models):
        super(MyHugeModel, self).__init__()
        self.models = models
        
    def forward(self, x):
        for model in self.models[:-1]:
            x = F.relu(model(x))
        x = self.models[-1](x)  # don't use relu for last model
        return x
            
class MySmallModel(nn.Module):
    def __init__(self):
        super(MySmallModel, self).__init__()
        self.lin = nn.Linear(10, 10)
        
    def forward(self, x):
        x = self.lin(x)
        return x


models = nn.ModuleList()
for _ in range(10):
    models.append(MySmallModel())

model = MyHugeModel(models)
x = torch.randn(1, 10)
output = model(x)

Using this approach, you can pass all parameters as model.parameters() to your optimizer.

cat-loves-donuts · June 20, 2019, 1:04pm

Thank you so much, this is exactly what I want. Thank you.
By the way, may I ask another question please? Well, I tried to make the architectures of the neural networks more complex so, in some models, some layers will skip connecting with other layers. How can I write this skip connection function? Should I do this in the forward function?

ptrblck · June 20, 2019, 1:06pm

Yes, the forward method is the right place to add skip connections.
Have a look at the resnet implementation for an example.

cat-loves-donuts · June 20, 2019, 2:14pm

Thank you so much. But I faced another question. I let the function randomly choose different convolution layers with different kernel sizes. The question is I used skip connect in this function. So when it skips connecting, the size of the tensor is different and cannot add each other. I mean I tried to sum different layers output when they are skipping connected. But the size of the tensors are different and the pytorch gave me an error:

The size of tensor a (26) must match the size of tensor b (18) at non-singleton dimension 3

I know in this error, it said the size of the tensor is 18X18, but another one is 26X26. So, could you please tell me some methods to change the size of the output.

ptrblck · June 20, 2019, 10:42pm

You could use an interpolation (or cropping) to create equally sized activations in a similar manner as is currently being done with downsample in the resnet example.
Note that the optional downsampling changed the number of channel, not the spatial size, so that you cannot copy the code directly.

cat-loves-donuts · June 20, 2019, 11:12pm

Thank you so much. That is really helpful. Can I ask one more question, please? Well, I wrote a code which allowed the system to randomly choose different convolution layers with different kernel sizes and the kernel size is the only difference between each convolution layers. For example, 3 conv layers: 3X3, 1X1,5X5. But how can I get the information of the kernel size and save them in an array or a string after I create this model?

ptrblck · June 20, 2019, 11:13pm

What would you want to do with this information?
Is your model choosing one of these conv layers randomly for each batch or only once at initialization?

cat-loves-donuts · June 20, 2019, 11:26pm

Well, I am trying to work on NAS(Neural Architecture Search). And my prototype will choose different types of convolution layer when I create a new model. I need to save that information and find the best performance one.

I wrote an independent file which defines different convolution layers in it. And this function will return a random layer which have a random kernel size. And this function will be used in the definition file of the model. I mean, there is a loop to add the layers into the model and a special forward function which allows skipping connection between layers in the definition file.

ptrblck · June 20, 2019, 11:53pm

Ah OK, thanks for the information.
You can get the kernel size using conv.kernel_size and store it in your config file.

cat-loves-donuts · June 21, 2019, 12:06am

Oh, thank you so much. I never thinking of one line code can finish this work. Anyway, thank you so much for answering me so many questions. You are so kind. Thank you so much.

GLV · June 9, 2021, 12:17pm

Hello ptrblock
I am trying to connect 3 different GNC together
but the solution you propose here does not work:
here is my code:
a very simple GNC

class Net(torch.nn.Module):
def init(self):
super(Net, self).init()
self.conv1 = GCNConv(300,256)
self.conv2 = GCNConv(256, 1)

def forward(self,x, edge_index):
    x = self.conv1(x, edge_index)
    x = x.relu()
    x = self.conv2(x, edge_index)
    return x

then

here we instantiate all of the variables necessary to train the model

model1 = Net().cpu()
model2 = Net().cpu()
model3 = Net().cpu()

set hyperparameters

optimizer = torch.optim.Adam(model1.parameters(), lr = 0.01)
criterion1 = MSELoss()

loading subsets for each graph layer

data_L3 = data[‘graph1’]
data_L4 = data[‘graph2’]
data_L5 = data[‘graph3’]

create different subsets for reach graph layer

loader_L3 = RandomNodeSampler(data_L3,2)
loader_L4 = RandomNodeSampler(data_L4,2)
loader_L5 = RandomNodeSampler(data_L5,2)

then

def train(loader,model):
for epoch in range(2):
running_loss = 0.0
model.train()
for data in loader:
data = data.cpu()
# change to float to match with the weights of the class model
data.x = data.x.float()
logits = model1(data.x, data.edge_index)
loss = criterion1(logits, data.y)
optimizer.step()
running_loss += loss.item()
# we print out here if we want to see the loss
print(epoch+1,running_loss/2000)
running_loss = 0.0
return model

Then train 3 models

Train each model on correct data subset

train(loader_L3,model1)
train(loader_L4,model2)
train(loader_L5,model3)

How I can combine into one single model
I try to use
torch.stack([self.GNC1.weight, self.GNC2.weight, self.GNC2.weight], dim=0)
with no success

Can you help, please

ptrblck · June 9, 2021, 6:08pm

Could you post the error message you were seeing or what exactly is not working, please?

PS: you can post code snippets by wrapping them into three backticks ```, which would make debugging easier

Aileen · September 10, 2024, 10:31am

Could you please explain this a bit? I am relatively new to PyTorch and I did quite catch the part where we are looping this in

ptrblck · September 10, 2024, 12:40pm

I’m unsure which part of my post is unclear as the first one applies the models in a sequential way while the second one iterates the smaller models inside a larger one. Could you post the code that is unclear?

Aileen · September 10, 2024, 1:04pm

My bad, should have specified this earlier, I was curious about the implementation of the smaller model iterating inside the larger ones. The code is very clear but the part where I was unclear was on:

models = nn.ModuleList()
for _ in range(10):
models.append(MySmallModel())

model = MyHugeModel(models)
x = torch.randn(1, 10)
output = model(x)

I did not understand the ModuleList().
Sorry may be the very basics here, just trying to understand

ptrblck · September 10, 2024, 3:00pm

The nn.ModuleList is used to store the 10 MySmallModels and is then passed to MyHugeModel where it will be registered as submodules.
You could also create the 10 MySmallModels inside MyHugeModel.__init__ and would also use the nn.ModuleList to register them as submodules. If you would use a plain Python list these submodules will not be registered inside the parent class and thus won’t show up in its state_dict or .parameters() call.