How to Concatenate layers in PyTorch similar to tf.keras.layers.Concatenate

I’m trying to implement the following network in pytorch. I’m not sure if the method I used to combine layers is correct. In given network instead of convnet I’ve used pretrained VGG16 model.

model = models.vgg16(pretrained=True)  
new_classifier = nn.Sequential(*list(model.classifier.children())[:-1])
model.classifier = new_classifier
class Network(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(3,96,8, stride=16,padding=1)
        self.maxpool1 = nn.MaxPool2d(3,4,padding=1)
        self.conv2 = nn.Conv2d(3,96,8, stride=32,padding=1)
        self.maxpool2 = nn.MaxPool2d(7,2,padding=1)
    def forward(self,x):
        out1 = model(x)
        y = self.conv1(x)
        y = self.maxpool1(y)
        y = F.normalize(y,dim=1,p=2)
        z = self.conv2(x)
        z = self.maxpool2(z)
        z = F.normalize(z,dim=1,p=2)
        out =,y,z),1)
        return out
test = Network()

When I print summary it shows:

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 4 and 2 in dimension 2 at c:\a\w\1\s\windows\pytorch\aten\src\th\generic/THTensorMoreMath.cpp:1333

Where am I going wrong?


The shapes of the tensors you wish to concatenate do not match.

> torch.Size([1, 4096])
> torch.Size([1, 96, 4, 4])
> torch.Size([1, 96, 2, 2])

Based on the image you’ve posted it seems the conv activations should be flattened to a tensor with the shape [batch_size, 2 * 4*4*96 = 3072].
You could add this using:

y = y.view(y.size(0), -1)
z = z.view(y.size(0), -1)
out =, y, z), 1)

However, even then the architecture won’t match, since s is only [batch_size, 96, 2, 2].


So, what do you recommend I should do?
I wish to implement model like this

convnet_model = convnet_model_()
first_input = Input(shape=(224,224,3))
first_conv = Conv2D(96, kernel_size=(8, 8),strides=(16,16), padding='same')(first_input)
first_max = MaxPool2D(pool_size=(3,3),strides = (4,4),padding='same')(first_conv)
first_max = Flatten()(first_max)
first_max = Lambda(lambda  x: K.l2_normalize(x,axis=1))(first_max)
second_input = Input(shape=(224,224,3))
second_conv = Conv2D(96, kernel_size=(8, 8),strides=(32,32), padding='same')(second_input)
second_max = MaxPool2D(pool_size=(7,7),strides = (2,2),padding='same')(second_conv)
second_max = Flatten()(second_max)
second_max = Lambda(lambda  x: K.l2_normalize(x,axis=1))(second_max)
merge_one = concatenate([first_max, second_max])

merge_two = concatenate([merge_one, convnet_model.output])
emb = Dense(4096)(merge_two)
l2_norm_final = Lambda(lambda  x: K.l2_normalize(x,axis=1))(emb)

final_model = Model(inputs=[first_input, second_input, convnet_model.input], outputs=l2_norm_final)

Thanks for the code.
It looks like to padding of your second max pooling layer is wrong, since you are using the same argument in Keras.
Try this definition self.maxpool2 = nn.MaxPool2d(7,2,padding=3) and your output will be [batch_size, 96, 4, 4] for both branches.


Hi. I want to concatenate last layer before fc in two models of resnet and desnet to each other. but it gives me error. Could you please write this code as an example. I dont know how to write it right. thank u so much.

Could you share your code and explain your approach as well as what is currently not working, please?


I have problem in training…i dont know what should i do? i appreciate if you help me? Thank you.

Assuming wrapping the model into the nn.Sequential container works fine, the code looks alright.
I would additionally recommend to add an activation function between the linear layers.

Note that some models are using the functional API in its forward, which could break the model if you just slice the children and add them into nn.Sequential. In that case you could replace the classifier of fc module with an nn.Identity module.

1 Like

Thank you so much for your reply. about this sentence you said:

Assuming wrapping the model into the nn.Sequential container works fine, the code looks alright.
I would additionally recommend to add an activation function between the linear layers.

one time i put nn.Sequential(nn.Linear…,…)…but there wasnt any difference. It’s kind of difficult problem. and i am new to pytorch…i dont know how to solve it? :frowning:

I would like to know what would be the impact if we put the nn.sequential models inside the network class as well.