How to train several network at the same time?

Hi, I am just a beginner of PyTorch, even a beginner of Python. So this question may too simple for you but I really want to get some help.
I want to train a neural network which is included sub-networks. It looks something like that.

In my opinion, this network could be implemented by constructing three simple fully-connection neural networks.
My implementation is:

#firstly, define the simple NN
class Net(nn.Module):
    def __init__(self,n_input,n_hidden1,n_hidden2):
        super(Net, self).__init__()
        self.hidden1 = nn.Linear(n_input,n_hidden1)
        self.hidden2 = nn.Linear(n_hidden1,n_hidden2)
        self.output  = nn.Linear(n_hidden2,1)

    def forward(self,x):
    x = F.tanh(self.hidden1(x))
    x = F.tanh(self.hidden2(x))
    x = self.output(x)
    return x

#construct 3 same nets
net = []
for i in range(3):

# define loss function and optimizer
optimizer = []
for i in range(3):
    optimizer.append(torch.optim.SGD(net[i].parameters(), lr = 0.2))
loss_func = torch.nn.MSELoss()

#training neural network
for step in range(300):  #training circles
    for i in range(3):
        output[:, i] = net[i](input[i, :, :])
    prediction = torch.sum(output,1)

    loss = loss_func(prediction,target)
    for i in range(3):
    for i in range(3):

after several circles, all output value become to the same value.
There must something wrong with my code. Could anyone give me some help?
Or are there any better way to implement this network?
Thanks a lot.

according to your diagram, you want to train one network that contains 3 sub networks?

but in your code, you are creating 3 separate optimizers.

I cant help wondering:

  1. why are you creating 3 optimiziers?
  2. what happens if you use just one optimizer instead?
1 Like

Oh, I have tried using only one optimizer. But
net = []
is a list but not torch.autograd.Variable, So there are no parameters in net. PyTorch reported as an error for one optimizer.

That’s why I used 3 optimizers.

sure. well, you have a couple of options. Because I htink you should be using only one optimizer

  • the more idiomatic solution might be to create another nn.Module child class, to hold the three Net modules. The rest should be straightforward

  • otherwise you can simply do something like:

    parameters = set()
    for net_ in net:
    parameters |= set(net_.parameters())


Thanks for replying, I will try these solutions and will give you a feedback.

Hi, I have just tried the second solution. It works well!

But the problem, output always become same value, still exists.

Fortunately, I solve this problem by adding an extra Batch_Normalization layer.

Thank you very much!

1 Like

Would the network learn different weights when using these seperate optimizers which all optimize the same loss, or is it just a matter of style or performance?

1 Like

Hi, I would like to share my experience in case it might be helpful for other newbies (like me) around here.
I created a bigger NN model (nn.Module child class) containing 2 smaller subnetworks.
I just defined first the simple subnetwork as a nn.module child class.
Then in the init method definition of the bigger model class, I just called/defined the 2 submodels.
They appear of course also in the forward method (with same input x in my case).
You can then add another layer to merge the solutions coming from subnetworks, in case you need.
Training goes than as usual.

Please can you share the code ? I’m beginner so I’m trying to understand codes and thank you

Hello Andrea, I have tried implementing the solution you have offered but something is bugging me.

In my case, I have two seperate networks as subnetworks and their output form a loss function together. I have initialized those subnetworks in a larger nn.Module child class and included them in the forward.

However, my reason for applying this was that I have a single GPU which seems to be under allocated (4 GiB allocation in a 32 GiB GPU) and I don’t want to allocate another GPU. But it seems like I am having problems at parallelization of my subnetworks. I was hoping that PyTorch internally allows parallelization by building the computation graph but it seems like something is wrong. My training with two subnetworks takes more than twice duration. Have you observed a similar situation?