Hello, I am new in Pytorch and this question makes me waste a couple of days. I am trying to connect two different neural networks together. The model one is a trained NN which I have already saved as a .pth file. Then I want to put another NN with a totally different architecture after it. Then I combining those two models and train them together. How can I connect two models? I mean simply use the output from module one as the input of model two but use one optimize function to train them. Because I may put a dozen of models after them.
Yes, that would just work.
You can basically treat each model as a “layer” passing one output to the next layer/model.
If you would like to pass all parameters to the optimizer, just concatenate the lists together:
model1 = nn.Linear(1, 1)
model2 = nn.Linear(2, 2)
optimizer = torch.optim.SGD(
list(model1.parameters()) + list(model2.parameters()),
lr=1e-3
)
Thank you so much. This method can solve the problem. But I want to combine the two models as one model and train and optimize them together. The reason why I want to do this is that I will connect a dozen of models one by one. So do you have any idea of how to combine different models? Thank you so much.
You can just combine them in the training loop:
for data, target in train_loader:
output = model1(data)
output = model2(output)
output = model3(output)
# or use a loop instead
or just wrap them in another nn.Module
and treat them as layers:
class MyHugeModel(nn.Module):
def __init__(self, models):
super(MyHugeModel, self).__init__()
self.models = models
def forward(self, x):
for model in self.models[:-1]:
x = F.relu(model(x))
x = self.models[-1](x) # don't use relu for last model
return x
class MySmallModel(nn.Module):
def __init__(self):
super(MySmallModel, self).__init__()
self.lin = nn.Linear(10, 10)
def forward(self, x):
x = self.lin(x)
return x
models = nn.ModuleList()
for _ in range(10):
models.append(MySmallModel())
model = MyHugeModel(models)
x = torch.randn(1, 10)
output = model(x)
Using this approach, you can pass all parameters as model.parameters()
to your optimizer.
Thank you so much, this is exactly what I want. Thank you.
By the way, may I ask another question please? Well, I tried to make the architectures of the neural networks more complex so, in some models, some layers will skip connecting with other layers. How can I write this skip connection function? Should I do this in the forward function?
Yes, the forward
method is the right place to add skip connections.
Have a look at the resnet implementation for an example.
Thank you so much. But I faced another question. I let the function randomly choose different convolution layers with different kernel sizes. The question is I used skip connect in this function. So when it skips connecting, the size of the tensor is different and cannot add each other. I mean I tried to sum different layers output when they are skipping connected. But the size of the tensors are different and the pytorch gave me an error:
The size of tensor a (26) must match the size of tensor b (18) at non-singleton dimension 3
I know in this error, it said the size of the tensor is 18X18, but another one is 26X26. So, could you please tell me some methods to change the size of the output.
You could use an interpolation (or cropping) to create equally sized activations in a similar manner as is currently being done with downsample
in the resnet example.
Note that the optional downsampling changed the number of channel, not the spatial size, so that you cannot copy the code directly.
Thank you so much. That is really helpful. Can I ask one more question, please? Well, I wrote a code which allowed the system to randomly choose different convolution layers with different kernel sizes and the kernel size is the only difference between each convolution layers. For example, 3 conv layers: 3X3, 1X1,5X5. But how can I get the information of the kernel size and save them in an array or a string after I create this model?
What would you want to do with this information?
Is your model choosing one of these conv layers randomly for each batch or only once at initialization?
Well, I am trying to work on NAS(Neural Architecture Search). And my prototype will choose different types of convolution layer when I create a new model. I need to save that information and find the best performance one.
I wrote an independent file which defines different convolution layers in it. And this function will return a random layer which have a random kernel size. And this function will be used in the definition file of the model. I mean, there is a loop to add the layers into the model and a special forward function which allows skipping connection between layers in the definition file.
Ah OK, thanks for the information.
You can get the kernel size using conv.kernel_size
and store it in your config file.
Oh, thank you so much. I never thinking of one line code can finish this work. Anyway, thank you so much for answering me so many questions. You are so kind. Thank you so much.
Hello ptrblock
I am trying to connect 3 different GNC together
but the solution you propose here does not work:
here is my code:
a very simple GNC
class Net(torch.nn.Module):
def init(self):
super(Net, self).init()
self.conv1 = GCNConv(300,256)
self.conv2 = GCNConv(256, 1)
def forward(self,x, edge_index):
x = self.conv1(x, edge_index)
x = x.relu()
x = self.conv2(x, edge_index)
return x
then
here we instantiate all of the variables necessary to train the model
model1 = Net().cpu()
model2 = Net().cpu()
model3 = Net().cpu()
set hyperparameters
optimizer = torch.optim.Adam(model1.parameters(), lr = 0.01)
criterion1 = MSELoss()
loading subsets for each graph layer
data_L3 = data[‘graph1’]
data_L4 = data[‘graph2’]
data_L5 = data[‘graph3’]
create different subsets for reach graph layer
loader_L3 = RandomNodeSampler(data_L3,2)
loader_L4 = RandomNodeSampler(data_L4,2)
loader_L5 = RandomNodeSampler(data_L5,2)
then
def train(loader,model):
for epoch in range(2):
running_loss = 0.0
model.train()
for data in loader:
data = data.cpu()
# change to float to match with the weights of the class model
data.x = data.x.float()
logits = model1(data.x, data.edge_index)
loss = criterion1(logits, data.y)
optimizer.step()
running_loss += loss.item()
# we print out here if we want to see the loss
print(epoch+1,running_loss/2000)
running_loss = 0.0
return model
Then train 3 models
Train each model on correct data subset
train(loader_L3,model1)
train(loader_L4,model2)
train(loader_L5,model3)
How I can combine into one single model
I try to use
torch.stack([self.GNC1.weight, self.GNC2.weight, self.GNC2.weight], dim=0)
with no success
Can you help, please
Could you post the error message you were seeing or what exactly is not working, please?
PS: you can post code snippets by wrapping them into three backticks ```, which would make debugging easier
Could you please explain this a bit? I am relatively new to PyTorch and I did quite catch the part where we are looping this in
I’m unsure which part of my post is unclear as the first one applies the models in a sequential way while the second one iterates the smaller models inside a larger one. Could you post the code that is unclear?
My bad, should have specified this earlier, I was curious about the implementation of the smaller model iterating inside the larger ones. The code is very clear but the part where I was unclear was on:
models = nn.ModuleList()
for _ in range(10):
models.append(MySmallModel())
model = MyHugeModel(models)
x = torch.randn(1, 10)
output = model(x)
I did not understand the ModuleList().
Sorry may be the very basics here, just trying to understand
The nn.ModuleList
is used to store the 10 MySmallModel
s and is then passed to MyHugeModel
where it will be registered as submodules.
You could also create the 10 MySmallModel
s inside MyHugeModel.__init__
and would also use the nn.ModuleList
to register them as submodules. If you would use a plain Python list
these submodules will not be registered inside the parent class and thus won’t show up in its state_dict
or .parameters()
call.