Hi! I’m trying to move my project from Tensorflow to PyTorch and I need your help with one thing. I have a model that process numerical data. It takes 21 values and returns 11. To make it more clear I simplified the case, and presented it on the graph:
In Tensorflow I just have three tf.keras.Sequential containers that I merge like this:
concat = tf.keras.layers.Concatenate(axis=-1, name='Concatenate')([model1, model2, model3])
model = tf.keras.Model(inputs=[input_ANN], outputs=[concat])
The inputs are the same for every single Sequential and are linked with functional API like:
model1=single_model(topology1)(input_ANN)
First sequential have 4 outputs, second 3 output and third 4 outputs, so the total sume is 11 as expected. Now I try to do the same in PyTorch (creating nn.Sequential is not a problem so I will skip to the merit):
model = torch.cat((model1, model1, model3), dim=1)
Throws me the error “expected Tensor as element 0 in argument 0, but got Sequential”. My next try is:
Thank you for the answer. I tested your code and it does not work. The error is “Sizes of tensors must match except in dimension 0. Expected size 4 but got size 3 for tensor number 1 in the list”. Here is my training loop:
for e in range(epochs):
train_loss = 0.0
model.train() # Optional when not using Model Specific layer
for data, labels in train_dataloader:
optimizer.zero_grad()
target = model(data) #The code crash here
loss = MSELoss(target, labels)
loss. Backward()
optimizer.step()
train_loss += loss. Item()
The dataloder is fine. When I pass it through linear model (simple single nn.Sequential 21 input, 11 output) it works, and the ANN is learning. Here it is just for clarification: train_dataloader = torch_data.DataLoader(dataset=train, batch_size=batch_size, shuffle=True)
The code I posted is to give you a demonstration of how you can construct this network in PyTorch and it works. For your tensors (you must be using batches), you need to take care of the shapes while concatenating by specifying the dimension on which to concatenate.
See this example where the input tensor is 55*21 (55 examples in one batch):
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.d1 = nn.Sequential(nn.Linear(21, 4), nn.Linear(4, 4))
self.d2 = nn.Sequential(nn.Linear(21, 3), nn.Linear(3, 3))
self.d3 = nn.Sequential(nn.Linear(21, 4), nn.Linear(4, 4))
def forward(self, x):
return torch.cat((self.d1(x), self.d2(x), self.d3(x)), dim=1) # specify the dimension here
x = torch.randn(55, 21)
model = Model()
y = model(x)
print(y.size()) # torch.Size([55, 11]) -- it works
Thank you! Now it works, dim=1 did the trick. Here is torchview visualization I generated just to make sure the topology is correct (batch_size=10).
Just out of the contest - I vaguely understand what you mean by “you must be using batches”. Does it mean that PyTorch can be used without batching data? Like I just throw all I have in one run? Every single example I found was using “batch_size” as obligatory.
That would, in a sense, imply using batch_size=1 which isn’t desired for many reasons including inefficient use of hardware (unless ofcourse you have to do it because of hardware constraints itself).
If you mean parallel computing between CPU threads - I don’t know the answer. I’m just starting my journey here, maybe someone more experienced will be able to answer.
I’d like to know if the branches are processed in parallel on the GPUs, and if not, what would be the best way to parallelize the branches in such an example.