What is the recommended way to combine sub-models in a branched neural network model

Hi! I’m trying to move my project from Tensorflow to PyTorch and I need your help with one thing. I have a model that process numerical data. It takes 21 values and returns 11. To make it more clear I simplified the case, and presented it on the graph:

In Tensorflow I just have three tf.keras.Sequential containers that I merge like this:

concat = tf.keras.layers.Concatenate(axis=-1, name='Concatenate')([model1, model2, model3])
model = tf.keras.Model(inputs=[input_ANN], outputs=[concat])

The inputs are the same for every single Sequential and are linked with functional API like:

 model1=single_model(topology1)(input_ANN)

First sequential have 4 outputs, second 3 output and third 4 outputs, so the total sume is 11 as expected. Now I try to do the same in PyTorch (creating nn.Sequential is not a problem so I will skip to the merit):

model = torch.cat((model1, model1, model3), dim=1)

Throws me the error “expected Tensor as element 0 in argument 0, but got Sequential”. My next try is:

model=nn.Sequential(*list(model1)+list(model2)+list(model3))

But I ended with “running_mean should contain 3 elements not 21” error. Could you explain me the recommended way to combain this three sequentials?

Hi,
I haven’t looked through your Keras/tf code but here’s a way to implement the attached network graph in PyTorch:

import torch
import torch.nn as nn

class Model(torch.nn.Module):
  def __init__(self):
    super(Model, self).__init__()

    self.d1 = nn.Sequential(nn.Linear(21, 4), nn.Linear(4, 4))
    self.d2 = nn.Sequential(nn.Linear(21, 3), nn.Linear(3, 3))
    self.d3 = nn.Sequential(nn.Linear(21, 4), nn.Linear(4, 4))

  def forward(self, x):
    return torch.cat((self.d1(x), self.d2(x), self.d3(x)))


x = torch.randn(21)
model = Model()
y = model(x)
print(y.size()) # torch.Size([11])

Feel free to ask more if this isn’t what you intended to implement.

Thank you for the answer. I tested your code and it does not work. The error is “Sizes of tensors must match except in dimension 0. Expected size 4 but got size 3 for tensor number 1 in the list”. Here is my training loop:

for e in range(epochs):
    train_loss = 0.0
    model.train()     # Optional when not using Model Specific layer
    for data, labels in train_dataloader:
        optimizer.zero_grad()
        target = model(data) #The code crash here
        loss = MSELoss(target, labels)
        loss. Backward()
        optimizer.step()
        train_loss += loss. Item()

The dataloder is fine. When I pass it through linear model (simple single nn.Sequential 21 input, 11 output) it works, and the ANN is learning. Here it is just for clarification:
train_dataloader = torch_data.DataLoader(dataset=train, batch_size=batch_size, shuffle=True)

You need to make sure the Tensors that all your sub-networks output are concatenateable. Can you print out the shape of all these tensors?

Here is the print(model) output

Model(
  (d1): Sequential(
    (0): Linear(in_features=21, out_features=4, bias=True)
    (1): Linear(in_features=4, out_features=4, bias=True)
  )
  (d2): Sequential(
    (0): Linear(in_features=21, out_features=3, bias=True)
    (1): Linear(in_features=3, out_features=3, bias=True)
  )
  (d3): Sequential(
    (0): Linear(in_features=21, out_features=4, bias=True)
    (1): Linear(in_features=4, out_features=4, bias=True)
  )
)

My Input is 21x55, output is 11x55

The code I posted is to give you a demonstration of how you can construct this network in PyTorch and it works. For your tensors (you must be using batches), you need to take care of the shapes while concatenating by specifying the dimension on which to concatenate.

See this example where the input tensor is 55*21 (55 examples in one batch):

class Model(torch.nn.Module):
  def __init__(self):
    super(Model, self).__init__()

    self.d1 = nn.Sequential(nn.Linear(21, 4), nn.Linear(4, 4))
    self.d2 = nn.Sequential(nn.Linear(21, 3), nn.Linear(3, 3))
    self.d3 = nn.Sequential(nn.Linear(21, 4), nn.Linear(4, 4))

  def forward(self, x):
    return torch.cat((self.d1(x), self.d2(x), self.d3(x)), dim=1) # specify the dimension here

x = torch.randn(55, 21)
model = Model()
y = model(x)
print(y.size()) # torch.Size([55, 11]) -- it works

Thank you! Now it works, dim=1 did the trick. Here is torchview visualization I generated just to make sure the topology is correct (batch_size=10).
image

Just out of the contest - I vaguely understand what you mean by “you must be using batches”. Does it mean that PyTorch can be used without batching data? Like I just throw all I have in one run? Every single example I found was using “batch_size” as obligatory.

That would, in a sense, imply using batch_size=1 which isn’t desired for many reasons including inefficient use of hardware (unless ofcourse you have to do it because of hardware constraints itself).

What would be the best way to parallelize the sub-networks?

Take a look at the answer I marked as solution. It handles 3 ANN in parallel. Is it what are you looking for?

If I do a torchview visualization and it shows 3 branches in parallel, does this always imply that the branches are then executed in parallel?

If you mean parallel computing between CPU threads - I don’t know the answer. I’m just starting my journey here, maybe someone more experienced will be able to answer.

I’d like to know if the branches are processed in parallel on the GPUs, and if not, what would be the best way to parallelize the branches in such an example.

Hello,
I am also interested if these branches are executed in parallel on GPU? Do we gain any speed with respect to training the three sub-networks independently and sequentially and then combining the results afterwards. Thanks!

After implementing this myself, the short answer is no, they are not run in parallel, unless you vectorize the forward function that manages the branches with vmap.